R: Feature Selection Using PLS

fs.pls {mt}

R Documentation

Feature Selection Using PLS

Description

Feature selection using coefficient of regression and VIP values of PLS.

Usage

  fs.pls(x,y, pls="simpls",ncomp=10,...)
  fs.plsvip(x,y, ncomp=10,...)
  fs.plsvip.1(x,y, ncomp=10,...)
  fs.plsvip.2(x,y, ncomp=10,...)

Arguments

`x`	A data frame or matrix of data set.
`y`	A factor or vector of class.
`pls`	A method for calculating PLS scores and loadings. The following methods are supported: `simpls:` SIMPLS algorithm. `kernelpls:` kernel algorithm. `oscorespls:` orthogonal scores algorithm. For details, see `simpls.fit`, `kernelpls.fit` and `oscorespls.fit` in package pls.
`ncomp`	The number of components to be used.
`...`	Arguments passed to or from other methods.

Details

fs.pls ranks the features by regression coefficient of PLS. Since the coefficient is a matrix due to the dummy multiple response variables designed for the classification (category) problem, the Mahalanobis distance of coefficient is applied to select the features. (Other ways, for example, the sum of absolute values of coefficient, or squared root of coefficient, can be used.)

fs.plsvip and fs.plsvip.1 carry out feature selection based on the the Mahalanobis distance and absolute values of PLS's VIP, respectively.

fs.plsvip.2 is similar to fs.plsvip and fs.plsvip.1, but the category response is not treated as dummy multiple response matrix.

Value

A list with components:

`fs.rank`	A vector of feature ranking scores.
`fs.order`	A vector of feature order from best to worst.
`stats`	A vector of measurements.

Author(s)

Wanchang Lin

Examples

## prepare data set
data(abr1)
cls <- factor(abr1$fact$class)
dat <- abr1$pos
## dat <- abr1$pos[,110:1930]

## fill zeros with NAs
dat <- mv.zene(dat)

## missing values summary
mv <- mv.stats(dat, grp=cls) 
mv    ## View the missing value pattern

## filter missing value variables
## dim(dat)
dat <- dat[,mv$mv.var < 0.15]
## dim(dat)

## fill NAs with mean
dat <- mv.fill(dat,method="mean")

## log transformation
dat <- preproc(dat, method="log10")

## select class "1" and "2" for feature ranking
ind <- grepl("1|2", cls)
mat <- dat[ind,,drop=FALSE] 
mat <- as.matrix(mat)
grp <- cls[ind, drop=TRUE]   

## apply PLS methods for feature selection
res.pls      <- fs.pls(mat,grp, ncomp=4)
res.plsvip   <- fs.plsvip(mat,grp, ncomp=4)
res.plsvip.1 <- fs.plsvip.1(mat,grp, ncomp=4)
res.plsvip.2 <- fs.plsvip.2(mat,grp, ncomp=4)

## check differences among these methods
fs.order <- data.frame(pls      = res.pls$fs.order,
                       plsvip   = res.plsvip$fs.order,
                       plsvip.1 = res.plsvip.1$fs.order,
                       plsvip.2 = res.plsvip.2$fs.order)
head(fs.order, 20)

[Package mt version 2.0-1.20 Index]