variable.selection {plsgenomics} | R Documentation |
Variable selection using the PLS weights
Description
The function variable.selection
performs variable selection for binary classification.
Usage
variable.selection(X, Y, nvar=NULL)
Arguments
X |
a (n x p) data matrix of predictors. X may be a matrix or a data frame. Each row corresponds to an observation and each column corresponds to a predictor variable. |
Y |
a vector of length n giving the classes of the n observations. The two classes must be coded as 1,2. |
nvar |
the number of variables to be returned. If |
Details
The function variable.selection
orders the variables according to
the absolute value of the weight defining the first PLS
component. This ordering is equivalent to the ordering obtained with the
F-statistic and t-test with equal variances (Boulesteix, 2004).
For computational reasons, the function variable.selection
does not use
the pls algorithm, but the obtained ordering of the variables is exactly
equivalent to the ordering obtained using the PLS weights output by
pls.regression
.
Value
A vector of length nvar
(or of length p if nvar=NULL
) containing the indices of
the variables to be selected. The variables are ordered from the best to the worst variable.
Author(s)
Anne-Laure Boulesteix (https://www.ibe.med.uni-muenchen.de/mitarbeiter/professoren/boulesteix/index.html)
References
A. L. Boulesteix (2004). PLS dimension reduction for classification with microarray data, Statistical Applications in Genetics and Molecular Biology 3, Issue 1, Article 33.
A. L. Boulesteix, K. Strimmer (2007). Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 7:32-44.
S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251–263.
See Also
Examples
# load plsgenomics library
library(plsgenomics)
# generate X and Y (4 observations and 3 variables)
X<-matrix(c(4,3,3,4,1,0,6,7,3,5,5,9),4,3,byrow=FALSE)
Y<-c(1,1,2,2)
# select the 2 best variables
variable.selection(X,Y,nvar=2)
# order the 3 variables
variable.selection(X,Y)
# load the leukemia data
data(leukemia)
# select the 50 best variables from the leukemia data
variable.selection(leukemia$X,leukemia$Y,nvar=50)