multinom.spls.cv {plsgenomics} | R Documentation |
Cross-validation procedure to calibrate the parameters (ncomp, lambda.l1, lambda.ridge) for the multinomial-SPLS method
Description
The function multinom.spls.cv
chooses the optimal values for the
hyper-parameter of the multinom.spls
procedure, by minimizing the
averaged error of prediction over the hyper-parameter grid,
using Durif et al. (2018) multinomial-SPLS algorithm.
Usage
multinom.spls.cv(
X,
Y,
lambda.ridge.range,
lambda.l1.range,
ncomp.range,
adapt = TRUE,
maxIter = 100,
svd.decompose = TRUE,
return.grid = FALSE,
ncores = 1,
nfolds = 10,
nrun = 1,
center.X = TRUE,
scale.X = FALSE,
weighted.center = TRUE,
seed = NULL,
verbose = TRUE
)
Arguments
X |
a (n x p) data matrix of predictors. |
Y |
a (n) vector of (continuous) responses. |
lambda.ridge.range |
a vector of positive real values.
|
lambda.l1.range |
a vecor of positive real values, in [0,1].
|
ncomp.range |
a vector of positive integers. |
adapt |
a boolean value, indicating whether the sparse PLS selection step sould be adaptive or not (see details). |
maxIter |
a positive integer, the maximal number of iterations in the RIRLS algorithm (see details). |
svd.decompose |
a boolean parameter. |
return.grid |
a boolean values indicating whether the grid of hyper-parameters values with corresponding mean prediction error rate over the folds should be returned or not. |
ncores |
a positve integer, indicating the number of cores that the cross-validation is allowed to use for parallel computation (see details). |
nfolds |
a positive integer indicating the number of folds in the
K-folds cross-validation procedure, |
nrun |
a positive integer indicating how many times the K-folds cross- validation procedure should be repeated, default is 1. |
center.X |
a boolean value indicating whether the data matrices
|
scale.X |
a boolean value indicating whether the data matrices
|
weighted.center |
a boolean value indicating whether the centering should take into account the weighted l2 metric or not in the SPLS step. |
seed |
a positive integer value (default is NULL). If non NULL, the seed for pseudo-random number generation is set accordingly. |
verbose |
a boolean parameter indicating the verbosity. |
Details
The columns of the data matrices X
may not be standardized,
since standardizing is performed by the function multinom.spls.cv
as a preliminary step.
The procedure is described in Durif et al. (2018). The K-fold cross-validation can be summarize as follow: the train set is partitioned into K folds, for each value of hyper-parameters the model is fit K times, using each fold to compute the prediction error rate, and fitting the model on the remaining observations. The cross-validation procedure returns the optimal hyper-parameters values, meaning the one that minimize the averaged error of prediction averaged over all the folds.
This procedures uses mclapply
from the parallel
package,
available on GNU/Linux and MacOS. Users of Microsoft Windows can refer to
the README file in the source to be able to use a mclapply type function.
Value
An object of class multinom.spls
with the following attributes
lambda.ridge.opt |
the optimal value in |
lambda.l1.opt |
the optimal value in |
ncomp.opt |
the optimal value in |
conv.per |
the overall percentage of models that converge during the cross-validation procedure. |
cv.grid |
the grid of hyper-parameters and corresponding prediction
error rate averaged over the folds. |
Author(s)
Ghislain Durif (https://gdurif.perso.math.cnrs.fr/).
References
Durif, G., Modolo, L., Michaelsson, J., Mold, J.E., Lambert-Lacroix, S., Picard, F., 2018. High dimensional classification with combined adaptive sparse PLS and logistic regression. Bioinformatics 34, 485–493. doi:10.1093/bioinformatics/btx571. Available at http://arxiv.org/abs/1502.05933.
See Also
multinom.spls
, multinom.spls.stab
Examples
## Not run:
### load plsgenomics library
library(plsgenomics)
### generating data
n <- 100
p <- 100
nclass <- 3
sample1 <- sample.multinom(n=n, p=p, nb.class=nclass, kstar=10, lstar=2,
beta.min=0.25, beta.max=0.75, mean.H=0.2,
sigma.H=10, sigma.F=5)
X <- sample1$X
Y <- sample1$Y
### hyper-parameters values to test
lambda.l1.range <- seq(0.05,0.95,by=0.1) # between 0 and 1
ncomp.range <- 1:10
# log-linear range between 0.01 a,d 1000 for lambda.ridge.range
logspace <- function( d1, d2, n) exp(log(10)*seq(d1, d2, length.out=n))
lambda.ridge.range <- signif(logspace(d1 <- -2, d2 <- 3, n=21), digits=3)
### tuning the hyper-parameters
cv1 <- multinom.spls.cv(X=X, Y=Y, lambda.ridge.range=lambda.ridge.range,
lambda.l1.range=lambda.l1.range,
ncomp.range=ncomp.range,
adapt=TRUE, maxIter=100, svd.decompose=TRUE,
return.grid=TRUE, ncores=1, nfolds=10)
str(cv1)
## End(Not run)