R: Feature Selection by PCA

fs.pca {mt}

R Documentation

Feature Selection by PCA

Description

Feature selection using PCA loadings.

Usage

  fs.pca(x,thres=0.8, ...)

Arguments

`x`	A data frame or matrix of data set.
`thres`	The threshold of the cumulative percentage of PC's explained variances.
`...`	Additional arguments to `prcomp`.

Details

Since PCA loadings is a matrix with respect to PCs, the Mahalanobis distance of loadings is applied to select the features. (Other ways, for example, the sum of absolute values of loadings, or squared root of loadings, can be used.)

It should be noticed that this feature selection method is unsupervised.

Value

A list with components:

`fs.rank`	A vector of feature ranking scores.
`fs.order`	A vector of feature order from best to worst.
`stats`	A vector of measurements.

Author(s)

Wanchang Lin

Examples

## prepare data set
data(abr1)
cls <- factor(abr1$fact$class)
dat <- abr1$pos
## dat <- abr1$pos[,110:1930]

## fill zeros with NAs
dat <- mv.zene(dat)

## missing values summary
mv <- mv.stats(dat, grp=cls) 
mv    ## View the missing value pattern

## filter missing value variables
## dim(dat)
dat <- dat[,mv$mv.var < 0.15]
## dim(dat)

## fill NAs with mean
dat <- mv.fill(dat,method="mean")

## log transformation
dat <- preproc(dat, method="log10")

## select class "1" and "2" for feature ranking
ind <- grepl("1|2", cls)
mat <- dat[ind,,drop=FALSE] 
mat <- as.matrix(mat)
grp <- cls[ind, drop=TRUE]   

## feature selection by PCA
res <- fs.pca(dat)
names(res)

[Package mt version 2.0-1.20 Index]