rpls.cv {plsgenomics} | R Documentation |
Determination of the ridge regularization parameter and the number of PLS components to be used for classification with RPLS for binary data
Description
The function rpls.cv
determines the best ridge regularization parameter and the best
number of PLS components to be used for classification for Fort and Lambert-Lacroix (2005)
RPLS algorithm.
Usage
rpls.cv(Ytrain, Xtrain, LambdaRange, ncompMax, NbIterMax=50, ncores=1)
Arguments
Xtrain |
a (ntrain x p) data matrix of predictors. |
Ytrain |
a ntrain vector of responses. |
LambdaRange |
the vector of positive real value from which the best ridge regularization parameter has to be chosen by cross-validation. |
ncompMax |
a positive integer. the best number of components is chosen from
1,..., |
NbIterMax |
a positive integer. |
ncores |
a positive integer. The number of cores to be used for parallel computing (if different from 1) |
Details
A cross-validation procedure is used to determine the best ridge regularization parameter and
number of PLS components to be used for classification with RPLS for binary data
(for categorical data see mrpls
and mrpls.cv
).
At each cross-validation run, Xtrain
is split into a pseudo training
set (ntrain-1 samples) and a pseudo test set (1 sample) and the classification error rate is
determined for each value of ridge regularization parameter and number of components. Finally,
the function mrpls.cv
returns the values of the ridge regularization parameter and
bandwidth for which the mean classification error rate is minimal.
Value
A list with the following components:
Lambda |
the optimal regularization parameter. |
ncomp |
the optimal number of PLS components. |
Author(s)
Sophie Lambert-Lacroix (http://membres-timc.imag.fr/Sophie.Lambert/).
References
G. Fort and S. Lambert-Lacroix (2005). Classification using Partial Least Squares with Penalized Logistic Regression, Bioinformatics, vol 21, n 8, 1104-1111.
See Also
Examples
## Not run:
## between 5~15 seconds
# load plsgenomics library
library(plsgenomics)
# load Colon data
data(Colon)
IndexLearn <- c(sample(which(Colon$Y==2),12),sample(which(Colon$Y==1),8))
# preprocess data
res <- preprocess(Xtrain= Colon$X[IndexLearn,], Xtest=Colon$X[-IndexLearn,],
Threshold = c(100,16000),Filtering=c(5,500),
log10.scale=TRUE,row.stand=TRUE)
# the results are given in res$pXtrain and res$pXtest
# Determine optimum ncomp and lambda
nl <- rpls.cv(Ytrain=Colon$Y[IndexLearn]-1,Xtrain=res$pXtrain,LambdaRange=c(0.1,1),ncompMax=3)
# perform prediction by RPLS
resrpls <- rpls(Ytrain=Colon$Y[IndexLearn]-1,Xtrain=res$pXtrain,Lambda=nl$Lambda,
ncomp=nl$ncomp,Xtest=res$pXtest)
sum(resrpls$Ytest!=Colon$Y[-IndexLearn]-1)
## End(Not run)