Entropy {DescTools} | R Documentation |
Shannon Entropy and Mutual Information
Description
Computes Shannon entropy and the mutual information of two variables. The entropy quantifies the expected value of the information contained in a vector. The mutual information is a quantity that measures the mutual dependence of the two random variables.
Usage
Entropy(x, y = NULL, base = 2, ...)
MutInf(x, y, base = 2, ...)
Arguments
x |
a vector or a matrix of numerical or categorical type. If only x is supplied it will be interpreted as contingency table. |
y |
a vector with the same type and dimension as x. If y is not |
base |
base of the logarithm to be used, defaults to 2. |
... |
further arguments are passed to the function |
Details
The Shannon entropy equation provides a way to estimate the average minimum number of bits needed to encode a string of symbols, based on the frequency of the symbols.
It is given by the formula H = - \sum(\pi log(\pi))
where \pi
is the
probability of character number i showing up in a stream of characters of the given "script".
The entropy is ranging from 0 to Inf.
Value
a numeric value.
Author(s)
Andri Signorell <andri@signorell.net>
References
Shannon, Claude E. (July/October 1948). A Mathematical Theory of Communication, Bell System Technical Journal 27 (3): 379-423.
Ihara, Shunsuke (1993) Information theory for continuous systems, World Scientific. p. 2. ISBN 978-981-02-0985-8.
See Also
package entropy which implements various estimators of entropy
Examples
Entropy(as.matrix(rep(1/8, 8)))
# http://r.789695.n4.nabble.com/entropy-package-how-to-compute-mutual-information-td4385339.html
x <- as.factor(c("a","b","a","c","b","c"))
y <- as.factor(c("b","a","a","c","c","b"))
Entropy(table(x), base=exp(1))
Entropy(table(y), base=exp(1))
Entropy(x, y, base=exp(1))
# Mutual information is
Entropy(table(x), base=exp(1)) + Entropy(table(y), base=exp(1)) - Entropy(x, y, base=exp(1))
MutInf(x, y, base=exp(1))
Entropy(table(x)) + Entropy(table(y)) - Entropy(x, y)
MutInf(x, y, base=2)
# http://en.wikipedia.org/wiki/Cluster_labeling
tab <- matrix(c(60,10000,200,500000), nrow=2, byrow=TRUE)
MutInf(tab, base=2)
d.frm <- Untable(as.table(tab))
str(d.frm)
MutInf(d.frm[,1], d.frm[,2])
table(d.frm[,1], d.frm[,2])
MutInf(table(d.frm[,1], d.frm[,2]))
# Ranking mutual information can help to describe clusters
#
# r.mi <- MutInf(x, grp)
# attributes(r.mi)$dimnames <- attributes(tab)$dimnames
#
# # calculating ranks of mutual information
# r.mi_r <- apply( -r.mi, 2, rank, na.last=TRUE )
# # show only first 6 ranks
# r.mi_r6 <- ifelse( r.mi_r < 7, r.mi_r, NA)
# attributes(r.mi_r6)$dimnames <- attributes(tab)$dimnames
# r.mi_r6