## Dirichlet Prior Bayesian Estimators of Entropy, Mutual Information and Other Related Quantities

### Description

`freqs.Dirichlet` computes the Bayesian estimates of the bin frequencies using the Dirichlet-multinomial pseudocount model.

`entropy.Dirichlet` estimates the Shannon entropy H of the random variable Y from the corresponding observed counts `y` by plug-in of Bayesian estimates of the bin frequencies using the Dirichlet-multinomial pseudocount model.

`KL.Dirichlet` computes a Bayesian estimate of the Kullback-Leibler (KL) divergence from counts `y1` and `y2`.

`chi2.Dirichlet` computes a Bayesian version of the chi-squared divergence from counts `y1` and `y2`.

`mi.Dirichlet` computes a Bayesian estimate of mutual information of two random variables.

`chi2indep.Dirichlet` computes a Bayesian version of the chi-squared divergence of independence from a table of counts `y2d`.

### Usage

```freqs.Dirichlet(y, a)
entropy.Dirichlet(y, a, unit=c("log", "log2", "log10"))
KL.Dirichlet(y1, y2, a1, a2, unit=c("log", "log2", "log10"))
chi2.Dirichlet(y1, y2, a1, a2, unit=c("log", "log2", "log10"))
mi.Dirichlet(y2d, a, unit=c("log", "log2", "log10"))
chi2indep.Dirichlet(y2d, a, unit=c("log", "log2", "log10"))
```

### Arguments

 `y` vector of counts. `y1` vector of counts. `y2` vector of counts. `y2d` matrix of counts. `a` pseudocount per bin. `a1` pseudocount per bin for first random variable. `a2` pseudocount per bin for second random variable. `unit` the unit in which entropy is measured. The default is "nats" (natural units). For computing entropy in "bits" set `unit="log2"`.

### Details

The Dirichlet-multinomial pseudocount entropy estimator is a Bayesian plug-in estimator: in the definition of the Shannon entropy the bin probabilities are replaced by the respective Bayesian estimates of the frequencies, using a model with a Dirichlet prior and a multinomial likelihood.

The parameter `a` is a parameter of the Dirichlet prior, and in effect specifies the pseudocount per bin. Popular choices of `a` are:

• a=0:maximum likelihood estimator (see `entropy.empirical`)

• a=1/2:Jeffreys' prior; Krichevsky-Trovimov (1991) entropy estimator

• a=1:Laplace's prior

• a=1/length(y):Schurmann-Grassberger (1996) entropy estimator

• a=sqrt(sum(y))/length(y):minimax prior

The pseudocount `a` can also be a vector so that for each bin an individual pseudocount is added.

### Value

`freqs.Dirichlet` returns the Bayesian estimates of the frequencies .

`entropy.Dirichlet` returns the Bayesian estimate of the Shannon entropy.

`KL.Dirichlet` returns the Bayesian estimate of the KL divergence.

`chi2.Dirichlet` returns the Bayesian version of the chi-squared divergence.

`mi.Dirichlet` returns the Bayesian estimate of the mutual information.

`chi2indep.Dirichlet` returns the Bayesian version of the chi-squared divergence of independence.

### Author(s)

Korbinian Strimmer (http://www.strimmerlab.org).

### References

Agresti, A., and D. B. Hitchcock. 2005. Bayesian inference for categorical data analysis. Stat. Methods. Appl. 14:297–330.

Krichevsky, R. E., and V. K. Trofimov. 1981. The performance of universal encoding. IEEE Trans. Inf. Theory 27: 199-207.

Schurmann, T., and P. Grassberger. 1996. Entropy estimation of symbol sequences. Chaos 6:41-427.

`entropy`, `entropy.shrink`, `entropy.empirical`, `entropy.plugin`, `mi.plugin`, `KL.plugin`, `discretize`.

### Examples

```# load entropy library
library("entropy")

# a single variable

# observed counts for each bin
y = c(4, 2, 3, 0, 2, 4, 0, 0, 2, 1, 1)

# Dirichlet estimate of frequencies with a=1/2
freqs.Dirichlet(y, a=1/2)

# Dirichlet estimate of entropy with a=0
entropy.Dirichlet(y, a=0)

# identical to empirical estimate
entropy.empirical(y)

# Dirichlet estimate with a=1/2 (Jeffreys' prior)
entropy.Dirichlet(y, a=1/2)

# Dirichlet estimate with a=1 (Laplace prior)
entropy.Dirichlet(y, a=1)

# Dirichlet estimate with a=1/length(y)
entropy.Dirichlet(y, a=1/length(y))

# Dirichlet estimate with a=sqrt(sum(y))/length(y)
entropy.Dirichlet(y, a=sqrt(sum(y))/length(y))

# example with two variables

# observed counts for two random variables
y1 = c(4, 2, 3, 1, 10, 4)
y2 = c(2, 3, 7, 1, 4, 3)

# Bayesian estimate of Kullback-Leibler divergence (a=1/6)
KL.Dirichlet(y1, y2, a1=1/6, a2=1/6)

# half of the corresponding chi-squared divergence
0.5*chi2.Dirichlet(y1, y2, a1=1/6, a2=1/6)

## joint distribution example

# contingency table with counts for two discrete variables
y2d = rbind( c(1,2,3), c(6,5,4) )

# Bayesian estimate of mutual information (a=1/6)
mi.Dirichlet(y2d, a=1/6)

# half of the Bayesian chi-squared divergence of independence
0.5*chi2indep.Dirichlet(y2d, a=1/6)

```

