MINTsemiperm {semidist}R Documentation

Mutual information independence test (categorical-continuous case)

Description

Implement the mutual information independence test (MINT) (Berrett and Samworth, 2019), but with some modification in estimating the mutual informaion (MI) between a categorical random variable and a continuous variable. The modification is based on the idea of Ross (2014).

MINTsemiperm() implements the permutation independence test via mutual information, but the parameter k should be pre-specified.

MINTsemiauto() automatically selects an appropriate k based on a data-driven procedure, and conducts MINTsemiperm() with the k chosen.

Usage

MINTsemiperm(X, y, k, B = 1000)

MINTsemiauto(X, y, kmax, B1 = 1000, B2 = 1000)

Arguments

X

Data of multivariate continuous variables, which should be an n-by-p matrix, or, a vector of length n (for univariate variable).

y

Data of categorical variables, which should be a factor of length n.

k

Number of nearest neighbor. See References for details.

B, B1, B2

Number of permutations to use. Defaults to 1000.

kmax

Maximum k in the automatic search for optimal k.

Value

A list with class "indtest" containing the following components

For MINTsemiauto(), the list also contains

References

  1. Berrett, Thomas B., and Richard J. Samworth. "Nonparametric independence testing via mutual information." Biometrika 106, no. 3 (2019): 547-566.

  2. Ross, Brian C. "Mutual information between discrete and continuous data sets." PloS one 9, no. 2 (2014): e87357.

Examples

X <- mtcars[, c("mpg", "disp", "drat", "wt")]
y <- factor(mtcars[, "am"])

MINTsemiperm(X, y, 5)
MINTsemiauto(X, y, kmax = 32)


[Package semidist version 0.1.0 Index]