discrete_entropy {ForeCA} | R Documentation |
Shannon entropy for discrete pmf
Description
Computes the Shannon entropy \mathcal{H}(p) = -\sum_{i=1}^{n} p_i \log p_i
of a discrete RV X
taking
values in \lbrace x_1, \ldots, x_n \rbrace
with probability
mass function (pmf) P(X = x_i) = p_i
with
p_i \geq 0
for all i
and \sum_{i=1}^{n} p_i = 1
.
Usage
discrete_entropy(
probs,
base = 2,
method = c("MLE"),
threshold = 0,
prior.probs = NULL,
prior.weight = 0
)
Arguments
probs |
numeric; probabilities (empirical frequencies). Must be non-negative and add up to |
base |
logarithm base; entropy is measured in “nats” for
|
method |
string; method to estimate entropy; see Details below. |
threshold |
numeric; frequencies below |
prior.probs |
optional; only used if |
prior.weight |
numeric; how much weight does the prior distribution get in a mixture
model between data and prior distribution? Must be between |
Details
discrete_entropy
uses a plug-in estimator (method = "MLE"
):
\widehat{\mathcal{H}}(p) = - \sum_{i=1}^{n} \widehat{p}_i \log \widehat{p}_i.
If prior.weight > 0
, then it mixes the observed proportions \widehat{p}_i
with a prior distribution
\widehat{p}_i \leftarrow (1-\lambda) \cdot \widehat{p_i} + \lambda \cdot prior_i, \quad i=1, \ldots, n,
where \lambda \in [0, 1]
is the prior.weight
parameter. By default
the prior is a uniform distribution, i.e., prior_i = \frac{1}{n}
for all i.
Note that this plugin estimator is biased. See References for an overview of alternative methods.
Value
numeric; non-negative real value.
References
Archer E., Park I. M., Pillow J.W. (2014). “Bayesian Entropy Estimation for Countable Discrete Distributions”. Journal of Machine Learning Research (JMLR) 15, 2833-2868. Available at http://jmlr.org/papers/v15/archer14a.html.
See Also
Examples
probs.tmp <- rexp(5)
probs.tmp <- sort(probs.tmp / sum(probs.tmp))
unif.distr <- rep(1/length(probs.tmp), length(probs.tmp))
matplot(cbind(probs.tmp, unif.distr), pch = 19,
ylab = "P(X = k)", xlab = "k")
matlines(cbind(probs.tmp, unif.distr))
legend("topleft", c("non-uniform", "uniform"), pch = 19,
lty = 1:2, col = 1:2, box.lty = 0)
discrete_entropy(probs.tmp)
# uniform has largest entropy among all bounded discrete pmfs
# (here = log(5))
discrete_entropy(unif.distr)
# no uncertainty if one element occurs with probability 1
discrete_entropy(c(1, 0, 0))