fs.pls {mt} | R Documentation |
Feature Selection Using PLS
Description
Feature selection using coefficient of regression and VIP values of PLS.
Usage
fs.pls(x,y, pls="simpls",ncomp=10,...)
fs.plsvip(x,y, ncomp=10,...)
fs.plsvip.1(x,y, ncomp=10,...)
fs.plsvip.2(x,y, ncomp=10,...)
Arguments
x |
A data frame or matrix of data set. |
y |
A factor or vector of class. |
pls |
A method for calculating PLS scores and loadings. The following methods are supported:
For details, see |
ncomp |
The number of components to be used. |
... |
Arguments passed to or from other methods. |
Details
fs.pls
ranks the features by regression coefficient of PLS. Since the
coefficient is a matrix due to the dummy multiple response variables designed
for the classification (category) problem, the Mahalanobis distance of
coefficient is applied to select the features. (Other ways, for example, the sum
of absolute values of coefficient, or squared root of coefficient, can be used.)
fs.plsvip
and fs.plsvip.1
carry out feature selection based on the
the Mahalanobis distance and absolute values of PLS's VIP, respectively.
fs.plsvip.2
is similar to fs.plsvip
and fs.plsvip.1
, but
the category response is not treated as dummy multiple response matrix.
Value
A list with components:
fs.rank |
A vector of feature ranking scores. |
fs.order |
A vector of feature order from best to worst. |
stats |
A vector of measurements. |
Author(s)
Wanchang Lin
See Also
Examples
## prepare data set
data(abr1)
cls <- factor(abr1$fact$class)
dat <- abr1$pos
## dat <- abr1$pos[,110:1930]
## fill zeros with NAs
dat <- mv.zene(dat)
## missing values summary
mv <- mv.stats(dat, grp=cls)
mv ## View the missing value pattern
## filter missing value variables
## dim(dat)
dat <- dat[,mv$mv.var < 0.15]
## dim(dat)
## fill NAs with mean
dat <- mv.fill(dat,method="mean")
## log transformation
dat <- preproc(dat, method="log10")
## select class "1" and "2" for feature ranking
ind <- grepl("1|2", cls)
mat <- dat[ind,,drop=FALSE]
mat <- as.matrix(mat)
grp <- cls[ind, drop=TRUE]
## apply PLS methods for feature selection
res.pls <- fs.pls(mat,grp, ncomp=4)
res.plsvip <- fs.plsvip(mat,grp, ncomp=4)
res.plsvip.1 <- fs.plsvip.1(mat,grp, ncomp=4)
res.plsvip.2 <- fs.plsvip.2(mat,grp, ncomp=4)
## check differences among these methods
fs.order <- data.frame(pls = res.pls$fs.order,
plsvip = res.plsvip$fs.order,
plsvip.1 = res.plsvip.1$fs.order,
plsvip.2 = res.plsvip.2$fs.order)
head(fs.order, 20)