do.fssem {Rdimtools} | R Documentation |
Feature Subset Selection using Expectation-Maximization
Description
Feature Subset Selection using Expectation-Maximization (FSSEM) takes a wrapper approach to feature selection problem.
It iterates over optimizing the selection of variables by incrementally including each variable that adds the most
significant amount of scatter separability from a labeling obtained by Gaussian mixture model. This method is
quite computation intensive as it pertains to multiple fitting of GMM. Setting smaller max.k
for each round of
EM algorithm as well as target dimension ndim
would ease the burden.
Usage
do.fssem(
X,
ndim = 2,
max.k = 10,
preprocess = c("null", "center", "scale", "cscale", "whiten", "decorrelate")
)
Arguments
X |
an |
ndim |
an integer-valued target dimension. |
max.k |
maximum number of clusters for GMM fitting with EM algorithms. |
preprocess |
an additional option for preprocessing the data.
Default is "null". See also |
Value
a named list containing
- Y
an
(n\times ndim)
matrix whose rows are embedded observations.- trfinfo
a list containing information for out-of-sample prediction.
- projection
a
(p\times ndim)
whose columns are basis for projection.
Author(s)
Kisung You
References
Dy JG, Brodley CE (2004). “Feature Selection for Unsupervised Learning.” J. Mach. Learn. Res., 5, 845–889.
Examples
## run FSSEM with IRIS dataset - select 2 of 4 variables
data(iris)
irismat = as.matrix(iris[,2:4])
## select 50 observations for CRAN-purpose small example
id50 = sample(1:nrow(irismat), 50)
sel.dat = irismat[id50,]
sel.lab = as.factor(iris[id50,5])
## run and visualize
out0 = do.fssem(sel.dat, ndim=2, max.k=3)
opar = par(no.readonly=TRUE)
plot(out0$Y, main="small run", col=sel.lab, pch=19)
par(opar)
## Not run:
## NOT-FOR-CRAN example; run at your machine !
## try different maximum number of clusters
out3 = do.fssem(irismat, ndim=2, max.k=3)
out6 = do.fssem(irismat, ndim=2, max.k=6)
out9 = do.fssem(irismat, ndim=2, max.k=9)
## visualize
cols = as.factor(iris[,5])
opar = par(no.readonly=TRUE)
par(mfrow=c(3,1))
plot(out3$Y, main="max k=3", col=cols)
plot(out6$Y, main="max k=6", col=cols)
plot(out9$Y, main="max k=9", col=cols)
par(opar)
## End(Not run)