cvr.adaptive.ipflasso {ipflasso} | R Documentation |
Cross-validated integrative lasso with adaptive penalty factors
Description
Runs cvr.ipflasso applying different data based penalty factors to predictors from different blocks.
Usage
cvr.adaptive.ipflasso(X, Y, family, type.measure, standardize = TRUE,
alpha, type.step1, blocks, nfolds, ncv)
Arguments
X |
a (nxp) matrix of predictors with observations in rows and predictors in columns. |
Y |
n-vector giving the value of the response (either continuous, numeric-binary 0/1, or |
family |
should be "gaussian" for continuous |
type.measure |
the accuracy/error measure computed in cross-validation. If not specified, type.measure is "class" (classification error) if |
standardize |
whether the predictors should be standardized or not. Default is TRUE. |
alpha |
the elastic net mixing parameter for step 1: |
type.step1 |
whether the models of step 1 should be run on the whole data set |
blocks |
a list of length M of the format |
nfolds |
the number of folds of the CV procedure. |
ncv |
the number of repetitions of the CV. Not to be confused with |
Details
The penalty factors are the inverse arithmetic means of the absolute model coefficients per block, generated in a first step of the function. The user can choose to determine these coefficients by running a Lasso model (alpha=1
) or a Ridge model (alpha=0
) either on the whole data set (type.step1="comb"
) or seperately for each block (type.step1="sep"
). If type.step1
is ommited, it will be set to "sep"
for Lasso and to "comb"
for Ridge.
If a Lasso model in step 1 returns any zero coefficient mean, the corresponding block will be excluded from the input date set X
and step 2 will be run with the remaining blocks. If all model coefficient means are zero, step 2 will not be performed.
Value
A list with the following arguments:
coeff |
the matrix of coefficients with predictors corresponding to rows and lambda values corresponding to columns. The first row contains the intercept of the models (for all families other than In the special case of separate step 1 Lasso models and all coefficient means equal to zero, the intercept is the average of the separate model intercepts per block. |
ind.bestlambda |
the index of the best lambda according to CV. |
lambda |
the lambda sequence. In the special case of separate step 1 Lasso models and all coefficient means equal to zero, it is the lambda sequence with the highest lambda value among the lambda sequences of all blocks. |
cvm |
the CV estimate of the measure specified by In the special case of separate step 1 Lasso models and all coefficient means equal to zero, cmv is the average of the separate model cvms per block. |
nzero |
the number of non-zero coefficients in the selected model. In the special case of separate step 1 Lasso models and all coefficient means equal to zero, nzero is the sum of the non-zero coefficients of the separate models per block. |
family |
see arguments. |
means.step1 |
the arithmetic means of the absolute model coefficients per block, returned by the first step of the function. |
exc |
the exclusion vector containing the indices of the block(s) to be excluded from |
Author(s)
Gerhard Schulze (g-schulze@t-online.de)
References
Schulze, Gerhard (2017): Clinical Outcome Prediction Based on Multi-Omics Data: Extension of IPF-LASSO. Masterarbeit, Ludwig-Maximilians-Universitaet Muenchen (Department of Statistics: Technical Reports) https://doi.org/10.5282/ubm/epub.59092
Examples
# load ipflasso library
library(ipflasso)
# generate dummy data
X<-matrix(rnorm(50*200),50,200)
Y<-rbinom(50,1,0.5)
cvr.adaptive.ipflasso(X=X,Y=Y,family="binomial",type.measure="class",standardize=FALSE,
alpha = 1,blocks=list(block1=1:50,block2=51:200),nfolds=5,ncv=10)