R: Mutual Information for Selecting Features

do.mifs {Rdimtools}

R Documentation

Mutual Information for Selecting Features

Description

MIFS is a supervised feature selection that iteratively increases the subset of variables by choosing maximally informative feature based on the mutual information.

Usage

do.mifs(
  X,
  label,
  ndim = 2,
  beta = 0.75,
  discretize = c("default", "histogram"),
  preprocess = c("null", "center", "scale", "cscale", "whiten", "decorrelate")
)

Arguments

`X`	an `(n\times p)` matrix or data frame whose rows are observations and columns represent independent variables.
`label`	a length-`n` vector of class labels.
`ndim`	an integer-valued target dimension.
`beta`	penalty for relative importance of mutual information between the candidate and already-chosen features in iterations. Author proposes to use a value in `(0.5,1)`.
`discretize`	the method for each variable to be discretized. The paper proposes `"default"` method to use 10 bins while `"histogram"` uses automatic discretization via Sturges' method.
`preprocess`	an additional option for preprocessing the data. Default is "null". See also `aux.preprocess` for more details.

Value

a named list containing

Y: an (n\times ndim) matrix whose rows are embedded observations.
featidx: a length-ndim vector of indices with highest scores.
trfinfo: a list containing information for out-of-sample prediction.
projection: a (p\times ndim) whose columns are basis for projection.

Author(s)

Kisung You

References

Battiti R (1994). “Using Mutual Information for Selecting Features in Supervised Neural Net Learning.” IEEE Transactions on Neural Networks, 5(4), 537–550. ISSN 10459227.

Examples


## use iris data
## it is known that feature 3 and 4 are more important.
data(iris)
iris.dat = as.matrix(iris[,1:4])
iris.lab = as.factor(iris[,5])

## try different beta values
out1 = do.mifs(iris.dat, iris.lab, beta=0)
out2 = do.mifs(iris.dat, iris.lab, beta=0.5)
out3 = do.mifs(iris.dat, iris.lab, beta=1)

## visualize
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,3))
plot(out1$Y, pch=19, col=iris.lab, main="beta=0")
plot(out2$Y, pch=19, col=iris.lab, main="beta=0.5")
plot(out3$Y, pch=19, col=iris.lab, main="beta=1")
par(opar)

[Package Rdimtools version 1.1.2 Index]