CVLasoelascox {MicrobiomeSurv}R Documentation

Cross Validations for Lasso Elastic Net Survival predictive models and Classification

Description

The function does cross validation for Lasso, Elastic net and Ridge regressions models before the survial analysis and classification. The survival analysis is based on the selected taxa in the presence or absence of prognostic factors.

Usage

CVLasoelascox(
  Survival,
  Censor,
  Micro.mat,
  Prognostic,
  Standardize = TRUE,
  Alpha = 1,
  Fold = 4,
  Ncv = 10,
  nlambda = 100,
  Mean = TRUE,
  Quantile = 0.5
)

Arguments

Survival

A vector of survival time with length equals to number of subjects.

Censor

A vector of censoring indicator.

Micro.mat

A large or small microbiome profile matrix. A matrix with microbiome profiles where the number of rows is equal to the number of taxa and number of columns is equal to number of patients.

Prognostic

A dataframe containing possible prognostic(s) factor and/or treatment effect to be used in the model.

Standardize

A Logical flag for the standardization of the microbiome matrix, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE.

Alpha

The mixing parameter for glmnet (see glmnet). The range is 0<= Alpha <= 1. The Default is 1.

Fold

Number of folds to be used for the cross validation. Its value ranges between 3 and the number of subjects in the dataset.

Ncv

Number of validations to be carried out. The default is 10.

nlambda

The number of lambda values - default is 100 as in glmnet.

Mean

The cut off value for the classifier, default is the mean cutoff.

Quantile

If users want to use quantile as cutoff point. They need to specify Mean = FALSE and a quantile that they wish to use. The default is the median cutoff.

Details

The function performs the cross validations for Lasso, Elastic net and Ridge regressions models for Cox proportional hazard model. Taxa are selected at each iteration and then use for the classifier. Which implies that predictive taxa is varied from one cross validation to the other depending on selection. The underline idea is to investigate the Hazard Ratio for the train and test data based on the optimal lambda selected for the non-zero shrinkage coefficients, the nonzero selected taxa will thus be used in the survival analysis and in calculation of the risk scores for each sets of data.

Value

A object of class cvle is returned with the following values

Coef.mat

A matrix of coefficients with rows equals to number of cross validations and columns equals to number of taxa.

lambda

A vector of estimated optimum lambda for each iterations.

n

A vector of the number of selected taxa.

HRTrain

A matrix of survival information for the training dataset. It has three columns representing the estimated HR, the 95% lower confidence interval and the 95% upper confidence interval.

HRTest

A matrix of survival information for the test dataset. It has three columns representing the estimated HR, the 95% lower confidence interval and the 95% upper confidence interval.

pld

A vector of partial likelihood deviance at each cross validations.

Mi.mat

A matrix with 0 and 1. Number of rows equals to number of iterations and number of columns equals to number of 1 taxon indicates that the particular taxon was selected or had nonzero coefficient and otherwise it is zero.

Micro.mat

The Microbiome data matrix that was used for the analysis either same as Mdata or a reduced version.

Author(s)

Thi Huyen Nguyen, thihuyen.nguyen@uhasselt.be

Olajumoke Evangelina Owokotomo, olajumoke.x.owokotomo@gsk.com

Ziv Shkedy

See Also

coxph, EstimateHR, glmnet, Lasoelascox

Examples

# Prepare data
data(Week3_response)
Week3_response = data.frame(Week3_response)
surv_fam_shan_w3 = data.frame(cbind(as.numeric(Week3_response$T1Dweek),
as.numeric(Week3_response$T1D)))
colnames(surv_fam_shan_w3) = c("Survival", "Censor")
prog_fam_shan_w3 = data.frame(factor(Week3_response$Treatment_new))
colnames(prog_fam_shan_w3) = c("Treatment")
data(fam_shan_trim_w3)
names_fam_shan_trim_w3 =
c("Unknown", "Lachnospiraceae", "S24.7", "Lactobacillaceae", "Enterobacteriaceae", "Rikenellaceae")
fam_shan_trim_w3 = data.matrix(fam_shan_trim_w3[ ,2:82])
rownames(fam_shan_trim_w3) = names_fam_shan_trim_w3

# Using the function
CV_lasso_fam_shan_w3 = CVLasoelascox(Survival = surv_fam_shan_w3$Survival,
                                     Censor = surv_fam_shan_w3$Censor,
                                     Micro.mat = fam_shan_trim_w3,
                                     Prognostic = prog_fam_shan_w3,
                                     Standardize = TRUE,
                                     Alpha = 1,
                                     Fold = 4,
                                     Ncv = 10,
                                     nlambda = 100)

# Number of selected taxa per CV
CV_lasso_fam_shan_w3@n

# Get the matrix of coefficients
CV_lasso_fam_shan_w3@Coef.mat

# Survival information of the train dataset
CV_lasso_fam_shan_w3@HRTrain

# Survival information of the test dataset
CV_lasso_fam_shan_w3@HRTest

[Package MicrobiomeSurv version 0.1.0 Index]