spls {plsgenomics} | R Documentation |
Adaptive Sparse Partial Least Squares (SPLS) regression
Description
The function spls.adapt
performs compression and variable selection
in the context of linear regression (with possible prediction)
using Durif et al. (2018) adaptive SPLS algorithm.
Usage
spls(
Xtrain,
Ytrain,
lambda.l1,
ncomp,
weight.mat = NULL,
Xtest = NULL,
adapt = TRUE,
center.X = TRUE,
center.Y = TRUE,
scale.X = TRUE,
scale.Y = TRUE,
weighted.center = FALSE
)
Arguments
Xtrain |
a (ntrain x p) data matrix of predictor values.
|
Ytrain |
a (ntrain) vector of (continuous) responses. |
lambda.l1 |
a positive real value, in [0,1]. |
ncomp |
a positive integer. |
weight.mat |
a (ntrain x ntrain) matrix used to weight the l2 metric
in the observation space, it can be the covariance inverse of the Ytrain
observations in a heteroskedastic context. If NULL, the l2 metric is the
standard one, corresponding to homoskedastic model ( |
Xtest |
a (ntest x p) matrix containing the predictor values for the
test data set. |
adapt |
a boolean value, indicating whether the sparse PLS selection step sould be adaptive or not (see details). |
center.X |
a boolean value indicating whether the data matrices
|
center.Y |
a boolean value indicating whether the response values
|
scale.X |
a boolean value indicating whether the data matrices
|
scale.Y |
a boolean value indicating whether the response values
|
weighted.center |
a boolean value indicating whether the centering should take into account the weighted l2 metric or not (if TRUE, it requires that weighted.mat is non NULL). |
Details
The columns of the data matrices Xtrain
and Xtest
may
not be standardized, since standardizing can be performed by the function
spls
as a preliminary step.
The procedure described in Durif et al. (2018) is used to compute
latent sparse components that are used in a regression model.
In addition, when a matrix Xtest
is supplied, the procedure
predicts the response associated to these new values of the predictors.
Value
An object of class spls
with the following attributes
Xtrain |
the ntrain x p predictor matrix. |
Ytrain |
the response observations. |
sXtrain |
the centered if so and scaled if so predictor matrix. |
sYtrain |
the centered if so and scaled if so response. |
betahat |
the linear coefficients in model
|
betahat.nc |
the (p+1) vector containing the coefficients and intercept
for the non centered and non scaled model
|
meanXtrain |
the (p) vector of Xtrain column mean, used for centering if so. |
sigmaXtrain |
the (p) vector of Xtrain column standard deviation, used for scaling if so. |
meanYtrain |
the mean of Ytrain, used for centering if so. |
sigmaYtrain |
the standard deviation of Ytrain, used for centering if so. |
X.score |
a (n x ncomp) matrix being the observations coordinates or
scores in the new component basis produced by the compression step
(sparse PLS). Each column t.k of |
X.score.low |
a (n x ncomp) matrix being the PLS components only computed with the selected predictors. |
X.loading |
the (ncomp x p) matrix of coefficients in regression of
Xtrain over the new components |
Y.loading |
the (ncomp) vector of coefficients in regression of Ytrain
over the SPLS components |
X.weight |
a (p x ncomp) matrix being the coefficients of predictors
in each components produced by sparse PLS. Each column w.k of
|
residuals |
the (ntrain) vector of residuals in the model
|
residuals.nc |
the (ntrain) vector of residuals in the non centered
and non scaled model
|
hatY |
the (ntrain) vector containing the estimated reponse values
on the train set of centered and scaled (if so) predictors
|
hatY.nc |
the (ntrain) vector containing the estimated reponse value
on the train set of non centered and non scaled predictors |
hatYtest |
the (ntest) vector containing the predicted values
for the response on the centered and scaled test set |
hatYtest.nc |
the (ntest) vector containing the predicted values
for the response on the non centered and non scaled test set |
A |
the active set of predictors selected by the procedures. |
betamat |
a (ncomp) list of coefficient vector betahat in the model
with |
new2As |
a (ncomp) list of subset of |
lambda.l1 |
the sparse hyper-parameter used to fit the model. |
ncomp |
the number of components used to fit the model. |
V |
the (ntrain x ntrain) matrix used to weight the metric in the sparse PLS step. |
adapt |
a boolean value, indicating whether the sparse PLS selection step was adaptive or not. |
Author(s)
Ghislain Durif (https://gdurif.perso.math.cnrs.fr/).
Adapted in part from spls code by H. Chun, D. Chung and S.Keles (https://CRAN.R-project.org/package=spls).
References
Durif, G., Modolo, L., Michaelsson, J., Mold, J.E., Lambert-Lacroix, S., Picard, F., 2018. High dimensional classification with combined adaptive sparse PLS and logistic regression. Bioinformatics 34, 485–493. doi:10.1093/bioinformatics/btx571. Available at http://arxiv.org/abs/1502.05933.
Chun, H., & Keles, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society. Series B (Methodological), 72(1), 3-25. doi:10.1111/j.1467-9868.2009.00723.x
See Also
Examples
### load plsgenomics library
library(plsgenomics)
### generating data
n <- 100
p <- 100
sample1 <- sample.cont(n=n, p=p, kstar=10, lstar=2, beta.min=0.25,
beta.max=0.75, mean.H=0.2, sigma.H=10,
sigma.F=5, sigma.E=5)
X <- sample1$X
Y <- sample1$Y
### splitting between learning and testing set
index.train <- sort(sample(1:n, size=round(0.7*n)))
index.test <- (1:n)[-index.train]
Xtrain <- X[index.train,]
Ytrain <- Y[index.train,]
Xtest <- X[index.test,]
Ytest <- Y[index.test,]
### fitting the model, and predicting new observations
model1 <- spls(Xtrain=Xtrain, Ytrain=Ytrain, lambda.l1=0.5, ncomp=2,
weight.mat=NULL, Xtest=Xtest, adapt=TRUE, center.X=TRUE,
center.Y=TRUE, scale.X=TRUE, scale.Y=TRUE,
weighted.center=FALSE)
str(model1)
### plotting the estimation versus real values for the non centered response
plot(model1$Ytrain, model1$hatY.nc,
xlab="real Ytrain", ylab="Ytrain estimates")
points(-1000:1000,-1000:1000, type="l")
### plotting residuals versus centered response values
plot(model1$sYtrain, model1$residuals, xlab="sYtrain", ylab="residuals")
### plotting the predictor coefficients
plot(model1$betahat.nc, xlab="variable index", ylab="coeff")
### mean squares error of prediction on test sample
sYtest <- as.matrix(scale(Ytest, center=model1$meanYtrain, scale=model1$sigmaYtrain))
sum((model1$hatYtest - sYtest)^2) / length(index.test)
### plotting predicted values versus non centered real response values
## on the test set
plot(model1$hatYtest, sYtest, xlab="real Ytest", ylab="predicted values")
points(-1000:1000,-1000:1000, type="l")