logit.spls {plsgenomics} | R Documentation |
Classification procedure for binary response based on a logistic model, solved by a combination of the Ridge Iteratively Reweighted Least Squares (RIRLS) algorithm and the Adaptive Sparse PLS (SPLS) regression
Description
The function logit.spls
performs compression and variable selection
in the context of binary classification (with possible prediction)
using Durif et al. (2018) algorithm based on Ridge IRLS and sparse PLS.
Usage
logit.spls(
Xtrain,
Ytrain,
lambda.ridge,
lambda.l1,
ncomp,
Xtest = NULL,
adapt = TRUE,
maxIter = 100,
svd.decompose = TRUE,
center.X = TRUE,
scale.X = FALSE,
weighted.center = TRUE
)
Arguments
Xtrain |
a (ntrain x p) data matrix of predictor values.
|
Ytrain |
a (ntrain) vector of (continuous) responses. |
lambda.ridge |
a positive real value. |
lambda.l1 |
a positive real value, in [0,1]. |
ncomp |
a positive integer. |
Xtest |
a (ntest x p) matrix containing the predictor values for the
test data set. |
adapt |
a boolean value, indicating whether the sparse PLS selection step sould be adaptive or not (see details). |
maxIter |
a positive integer. |
svd.decompose |
a boolean parameter. |
center.X |
a boolean value indicating whether the data matrices
|
scale.X |
a boolean value indicating whether the data matrices
|
weighted.center |
a boolean value indicating whether the centering should take into account the weighted l2 metric or not in the SPLS step. |
Details
The columns of the data matrices Xtrain
and Xtest
may
not be standardized, since standardizing can be performed by the function
logit.spls
as a preliminary step.
The procedure described in Durif et al. (2018) is used to compute
latent sparse components that are used in a logistic regression model.
In addition, when a matrix Xtest
is supplied, the procedure
predicts the response associated to these new values of the predictors.
Value
An object of class logit.spls
with the following attributes
Coefficients |
the (p+1) vector containing the linear coefficients associated to the predictors and intercept in the logistic model explaining the response Y. |
hatY |
the (ntrain) vector containing the estimated response value on
the train set |
hatYtest |
the (ntest) vector containing the predicted labels
for the observations from |
DeletedCol |
the vector containing the indexes of columns with null
variance in |
A |
the active set of predictors selected by the procedures.
|
Anames |
Vector of selected predictor names, i.e. the names of the
columns from |
converged |
a {0,1} value indicating whether the RIRLS algorithm did
converge in less than |
X.score |
a (n x ncomp) matrix being the observations coordinates or
scores in the new component basis produced by the SPLS step (sparse PLS).
Each column t.k of |
X.weight |
a (p x ncomp) matrix being the coefficients of predictors
in each components produced by sparse PLS. Each column w.k of
|
Xtrain |
the design matrix. |
sXtrain |
the scaled predictor matrix. |
Ytrain |
the response observations. |
sPseudoVar |
the scaled pseudo-response produced by the RIRLS algorithm. |
lambda.ridge |
the Ridge hyper-parameter used to fit the model. |
lambda.l1 |
the sparse hyper-parameter used to fit the model. |
ncomp |
the number of components used to fit the model. |
V |
the (ntrain x ntrain) matrix used to weight the metric in the
sparse PLS step. |
proba |
the (ntrain) vector of estimated probabilities for the
observations in code |
proba.test |
the (ntest) vector of predicted probabilities for the
new observations in |
Author(s)
Ghislain Durif (https://gdurif.perso.math.cnrs.fr/).
References
Durif, G., Modolo, L., Michaelsson, J., Mold, J.E., Lambert-Lacroix, S., Picard, F., 2018. High dimensional classification with combined adaptive sparse PLS and logistic regression. Bioinformatics 34, 485–493. doi:10.1093/bioinformatics/btx571. Available at http://arxiv.org/abs/1502.05933.
See Also
Examples
## Not run:
### load plsgenomics library
library(plsgenomics)
### generating data
n <- 100
p <- 100
sample1 <- sample.bin(n=n, p=p, kstar=20, lstar=2,
beta.min=0.25, beta.max=0.75,
mean.H=0.2, sigma.H=10, sigma.F=5)
X <- sample1$X
Y <- sample1$Y
### splitting between learning and testing set
index.train <- sort(sample(1:n, size=round(0.7*n)))
index.test <- (1:n)[-index.train]
Xtrain <- X[index.train,]
Ytrain <- Y[index.train,]
Xtest <- X[index.test,]
Ytest <- Y[index.test,]
### fitting the model, and predicting new observations
model1 <- logit.spls(Xtrain=Xtrain, Ytrain=Ytrain, lambda.ridge=2,
lambda.l1=0.5, ncomp=2, Xtest=Xtest, adapt=TRUE,
maxIter=100, svd.decompose=TRUE)
str(model1)
### prediction error rate
sum(model1$hatYtest!=Ytest) / length(index.test)
## End(Not run)