R: Cross-validated integrative lasso with adaptive penalty...

cvr.adaptive.ipflasso {ipflasso}

R Documentation

Cross-validated integrative lasso with adaptive penalty factors

Description

Runs cvr.ipflasso applying different data based penalty factors to predictors from different blocks.

Usage

cvr.adaptive.ipflasso(X, Y, family, type.measure, standardize = TRUE,
                                  alpha, type.step1, blocks, nfolds, ncv)

Arguments

`X`	a (nxp) matrix of predictors with observations in rows and predictors in columns.
`Y`	n-vector giving the value of the response (either continuous, numeric-binary 0/1, or `Surv` object).
`family`	should be "gaussian" for continuous `Y`, "binomial" for binary `Y`, "cox" for `Y` of type `Surv`.
`type.measure`	the accuracy/error measure computed in cross-validation. If not specified, type.measure is "class" (classification error) if `family="binomial"`, "mse" (mean squared error) if `family="gaussian"` and partial likelihood if `family="cox"`. If `family="binomial"`, one may specify `type.measure="auc"` (area under the ROC curve).
`standardize`	whether the predictors should be standardized or not. Default is TRUE.
`alpha`	the elastic net mixing parameter for step 1: `alpha`=1 yields the L1 penalty (Lasso), `alpha`=0 yields the L2 penalty (Ridge).
`type.step1`	whether the models of step 1 should be run on the whole data set `X` (`type.step1="comb"`) or separately for each block (`type.step1="sep"`).
`blocks`	a list of length M of the format `list(block1=...,block2=...,` where the dots should be replaced by the indices of the predictors included in this block. The blocks should form a partition of 1:p.
`nfolds`	the number of folds of the CV procedure.
`ncv`	the number of repetitions of the CV. Not to be confused with `nfolds`. For example, if one repeats 50 times 5-fold-CV (i.e. considers 50 random partitions into 5 folds in turn and averages the results), `nfolds` equals 5 and `ncv` equals 50.

Details

The penalty factors are the inverse arithmetic means of the absolute model coefficients per block, generated in a first step of the function. The user can choose to determine these coefficients by running a Lasso model (alpha=1) or a Ridge model (alpha=0) either on the whole data set (type.step1="comb") or seperately for each block (type.step1="sep"). If type.step1 is ommited, it will be set to "sep" for Lasso and to "comb" for Ridge. If a Lasso model in step 1 returns any zero coefficient mean, the corresponding block will be excluded from the input date set X and step 2 will be run with the remaining blocks. If all model coefficient means are zero, step 2 will not be performed.

Value

A list with the following arguments:

`coeff`	the matrix of coefficients with predictors corresponding to rows and lambda values corresponding to columns. The first row contains the intercept of the models (for all families other than `"cox"`). In the special case of separate step 1 Lasso models and all coefficient means equal to zero, the intercept is the average of the separate model intercepts per block.
`ind.bestlambda`	the index of the best lambda according to CV.
`lambda`	the lambda sequence. In the special case of separate step 1 Lasso models and all coefficient means equal to zero, it is the lambda sequence with the highest lambda value among the lambda sequences of all blocks.
`cvm`	the CV estimate of the measure specified by `type.measure` for each candidate lambda value. In the special case of separate step 1 Lasso models and all coefficient means equal to zero, cmv is the average of the separate model cvms per block.
`nzero`	the number of non-zero coefficients in the selected model. In the special case of separate step 1 Lasso models and all coefficient means equal to zero, nzero is the sum of the non-zero coefficients of the separate models per block.
`family`	see arguments.
`means.step1`	the arithmetic means of the absolute model coefficients per block, returned by the first step of the function.
`exc`	the exclusion vector containing the indices of the block(s) to be excluded from `X`.

Author(s)

Gerhard Schulze (g-schulze@t-online.de)

References

Schulze, Gerhard (2017): Clinical Outcome Prediction Based on Multi-Omics Data: Extension of IPF-LASSO. Masterarbeit, Ludwig-Maximilians-Universitaet Muenchen (Department of Statistics: Technical Reports) https://doi.org/10.5282/ubm/epub.59092

Examples

# load ipflasso library
library(ipflasso)

# generate dummy data
X<-matrix(rnorm(50*200),50,200)
Y<-rbinom(50,1,0.5)

cvr.adaptive.ipflasso(X=X,Y=Y,family="binomial",type.measure="class",standardize=FALSE,
                      alpha = 1,blocks=list(block1=1:50,block2=51:200),nfolds=5,ncv=10)

[Package ipflasso version 1.1 Index]