spls.cv {plsgenomics} | R Documentation |
Cross-validation procedure to calibrate the parameters (ncomp, lambda.l1) of the Adaptive Sparse PLS regression
Description
The function spls.cv
chooses the optimal values for the
hyper-parameter of the spls
procedure, by minimizing the mean
squared error of prediction over the hyper-parameter grid,
using Durif et al. (2018) adaptive SPLS algorithm.
Usage
spls.cv(
X,
Y,
lambda.l1.range,
ncomp.range,
weight.mat = NULL,
adapt = TRUE,
center.X = TRUE,
center.Y = TRUE,
scale.X = TRUE,
scale.Y = TRUE,
weighted.center = FALSE,
return.grid = FALSE,
ncores = 1,
nfolds = 10,
nrun = 1,
verbose = FALSE
)
Arguments
X |
a (n x p) data matrix of predictors. |
Y |
a (n) vector of (continuous) responses. |
lambda.l1.range |
a vecor of positive real values, in [0,1].
|
ncomp.range |
a vector of positive integers. |
weight.mat |
a (ntrain x ntrain) matrix used to weight the l2 metric
in the observation space, it can be the covariance inverse of the Ytrain
observations in a heteroskedastic context. If NULL, the l2 metric is the
standard one, corresponding to homoskedastic model ( |
adapt |
a boolean value, indicating whether the sparse PLS selection step sould be adaptive or not (see details). |
center.X |
a boolean value indicating whether the data matrices
|
center.Y |
a boolean value indicating whether the response values
|
scale.X |
a boolean value indicating whether the data matrices
|
scale.Y |
a boolean value indicating whether the response values
|
weighted.center |
a boolean value indicating whether the centering should take into account the weighted l2 metric or not (if TRUE, it requires that weighted.mat is non NULL). |
return.grid |
a boolean values indicating whether the grid of hyper-parameters values with corresponding mean prediction error rate over the folds should be returned or not. |
ncores |
a positve integer, indicating the number of cores that the cross-validation is allowed to use for parallel computation (see details). |
nfolds |
a positive integer indicating the number of folds in the
K-folds cross-validation procedure, |
nrun |
a positive integer indicating how many times the K-folds cross- validation procedure should be repeated, default is 1. |
verbose |
a boolean value indicating verbosity. |
Details
The columns of the data matrices Xtrain
and Xtest
may not
be standardized, since standardizing can be performed by the function
spls.cv
as a preliminary step.
The procedure is described in Durif et al. (2018). The K-fold cross-validation can be summarize as follow: the train set is partitioned into K folds, for each value of hyper-parameters the model is fit K times, using each fold to compute the prediction error rate, and fitting the model on the remaining observations. The cross-validation procedure returns the optimal hyper-parameters values, meaning the one that minimize the mean squared error of prediction averaged over all the folds.
This procedures uses the mclapply
from the parallel
package,
available on GNU/Linux and MacOS. Users of Microsoft Windows can refer to
the README file in the source to be able to use a mclapply type function.
Value
An object with the following attributes
lambda.l1.opt |
the optimal value in |
ncomp.opt |
the optimal value in |
cv.grid |
the grid of hyper-parameters and corresponding prediction
error rate over the folds.
|
Author(s)
Ghislain Durif (https://gdurif.perso.math.cnrs.fr/).
References
Durif, G., Modolo, L., Michaelsson, J., Mold, J.E., Lambert-Lacroix, S., Picard, F., 2018. High dimensional classification with combined adaptive sparse PLS and logistic regression. Bioinformatics 34, 485–493. doi:10.1093/bioinformatics/btx571. Available at http://arxiv.org/abs/1502.05933.
See Also
Examples
## Not run:
### load plsgenomics library
library(plsgenomics)
### generating data
n <- 100
p <- 100
sample1 <- sample.cont(n=n, p=p, kstar=10, lstar=2,
beta.min=0.25, beta.max=0.75, mean.H=0.2,
sigma.H=10, sigma.F=5, sigma.E=5)
X <- sample1$X
Y <- sample1$Y
### hyper-parameters values to test
lambda.l1.range <- seq(0.05,0.95,by=0.1) # between 0 and 1
ncomp.range <- 1:10
### tuning the hyper-parameters
cv1 <- spls.cv(X=X, Y=Y, lambda.l1.range=lambda.l1.range,
ncomp.range=ncomp.range, weight.mat=NULL, adapt=TRUE,
center.X=TRUE, center.Y=TRUE,
scale.X=TRUE, scale.Y=TRUE, weighted.center=FALSE,
return.grid=TRUE, ncores=1, nfolds=10, nrun=1)
str(cv1)
### otpimal values
cv1$lambda.l1.opt
cv1$ncomp.opt
## End(Not run)