R: Estimation of a penalized Cox model with time-independent...

pencox {pencal}

R Documentation

Estimation of a penalized Cox model with time-independent covariates

Description

This function estimates a penalized Cox model where only time-independent covariates are included as predictors, and then computes a bootstrap optimism correction procedure that is used to validate the predictive performance of the model

Usage

pencox(data, formula, penalty = "ridge", standardize = TRUE,
  penalty.factor = 1, n.alpha.elnet = 11, n.folds.elnet = 5,
  n.boots = 0, n.cores = 1, verbose = TRUE)

Arguments

`data`	a data frame with one row for each subject.It should at least contain a subject id (called `id`), the time to event outcome (`time`), and the binary censoring indicator (`event`), plus at least one covariate to be included in the linear predictor
`formula`	a formula specifying the variables in `data` to include as predictors in the penalized Cox model
`penalty`	the type of penalty function used for regularization. Default is `'ridge'`, other possible values are `'elasticnet'` and `'lasso'`
`standardize`	logical argument: should the covariates be standardized when included in the penalized Cox model? Default is `TRUE`
`penalty.factor`	a single value, or a vector of values, indicating whether the covariates (if any) should be penalized (1) or not (0). Default is `penalty.factor = 1`
`n.alpha.elnet`	number of alpha values for the two-dimensional grid of tuning parameteres in elasticnet. Only relevant if `penalty = 'elasticnet'`. Default is 11, so that the resulting alpha grid is c(1, 0.9, 0.8, ..., 0.1, 0)
`n.folds.elnet`	number of folds to be used for the selection of the tuning parameter in elasticnet. Only relevant if `penalty = 'elasticnet'`. Default is 5
`n.boots`	number of bootstrap samples to be used in the bootstrap optimism correction procedure. If 0, no bootstrapping is performed
`n.cores`	number of cores to use to parallelize the computation of the CBOCP. If `ncores = 1` (default), no parallelization is done. Pro tip: you can use `parallel::detectCores()` to check how many cores are available on your computer
`verbose`	if `TRUE` (default and recommended value), information on the ongoing computations is printed in the console

Value

A list containing the following objects:

call: the function call
pcox.orig: the penalized Cox model fitted on the original dataset;
surv.data: a data frame with the survival data
X.orig: a data frame with the design matrix used to estimate the Cox model
n.boots: number of bootstrap samples;
boot.ids: a list with the ids of bootstrapped subjects (when n.boots > 0);
pcox.boot: a list where each element is a fitted penalized Cox model for a given bootstrap sample (when n.boots > 0).

Author(s)

Mirko Signorelli

References

Signorelli, M. (2024). pencal: an R Package for the Dynamic Prediction of Survival with Many Longitudinal Predictors. To appear in: The R Journal. Preprint: arXiv:2309.15600

Signorelli, M., Spitali, P., Al-Khalili Szigyarto, C, The MARK-MD Consortium, Tsonaka, R. (2021). Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data. Statistics in Medicine, 40 (27), 6178-6196. DOI: 10.1002/sim.9178

Examples

# generate example data
set.seed(1234)
p = 4 # number of longitudinal predictors
simdata = simulate_prclmm_data(n = 100, p = p, p.relev = 2, 
             seed = 123, t.values = c(0, 0.2, 0.5, 1, 1.5, 2))
#create dataframe with baseline measurements only
baseline.visits = simdata$long.data[which(!duplicated(simdata$long.data$id)),]
df = merge(simdata$surv.data, baseline.visits, by = 'id')
df = df[ , -c(5:6)]

do.bootstrap = FALSE
# IMPORTANT: set do.bootstrap = TRUE to compute the optimism correction!
n.boots = ifelse(do.bootstrap, 100, 0)
more.cores = FALSE
# IMPORTANT: set more.cores = TRUE to speed computations up!
if (!more.cores) n.cores = 2
if (more.cores) {
   # identify number of available cores on your machine
   n.cores = parallel::detectCores()
   if (is.na(n.cores)) n.cores = 2
}

form = as.formula(~ baseline.age + marker1 + marker2
                     + marker3 + marker4)
base.pcox = pencox(data = df, 
              formula = form, 
              n.boots = n.boots, n.cores = n.cores) 
ls(base.pcox)

[Package pencal version 2.2.2 Index]