R: kNN Mutual Information Estimators

knn_mi {rmi}

R Documentation

kNN Mutual Information Estimators

Description

Computes mutual information based on the distribution of nearest neighborhood distances. Method available are KSG1 and KSG2 as described by Kraskov, et. al (2004) and the Local Non-Uniformity Corrected (LNC) KSG as described by Gao, et. al (2015). The LNC method is based on KSG2 but with PCA volume corrections to adjust for observed non-uniformity of the local neighborhood of each point in the sample.

Usage

knn_mi(data, splits, options)

Arguments

`data`	Matrix of sample observations, each row is an observation.
`splits`	A vector that describes which sets of columns in `data` to compute the mutual information between. For example, to compute mutual information between two variables use `splits = c(1,1)`. To compute redundancy among multiple random variables use `splits = rep(1,ncol(data))`. To compute the mutual information between two random vector list the dimensions of each vector.
`options`	A list that specifies the estimator and its necessary parameters (see details).

Details

Current available methods are LNC, KSG1 and KSG2.

For KSG1 use: options = list(method = "KSG1", k = 5)

For KSG2 use: options = list(method = "KSG2", k = 5)

For LNC use: options = list(method = "LNC", k = 10, alpha = 0.65), order needed k > ncol(data).

Author

Isaac Michaud, North Carolina State University, ijmichau@ncsu.edu

References

Gao, S., Ver Steeg G., & Galstyan A. (2015). Efficient estimation of mutual information for strongly dependent variables. Artificial Intelligence and Statistics: 277-286.

Kraskov, A., Stogbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical review E 69(6): 066138.

Examples

set.seed(123)
x <- rnorm(1000)
y <- x + rnorm(1000)
knn_mi(cbind(x,y),c(1,1),options = list(method = "KSG2", k = 6))

set.seed(123)
x <- rnorm(1000)
y <- 100*x + rnorm(1000)
knn_mi(cbind(x,y),c(1,1),options = list(method = "LNC", alpha = 0.65, k = 10))
#approximate analytic value of mutual information
-0.5*log(1-cor(x,y)^2)

z <- rnorm(1000)
#redundancy I(x;y;z) is approximately the same as I(x;y)
knn_mi(cbind(x,y,z),c(1,1,1),options = list(method = "LNC", alpha = c(0.5,0,0,0), k = 10))
#mutual information I((x,y);z) is approximately 0
knn_mi(cbind(x,y,z),c(2,1),options = list(method = "LNC", alpha = c(0.5,0.65,0), k = 10))

[Package rmi version 0.1.1 Index]