cox_cure_net {intsurv}R Documentation

Regularized Cox Cure Rate Model

Description

For right-censored data, fit a regularized Cox cure rate model through elastic-net penalty following Masud et al. (2018), and Zou and Hastie (2005). For right-censored data with uncertain event status, fit the regularized Cox cure model proposed by Wang et al. (2020). Without regularization, the model reduces to the regular Cox cure rate model (Kuk and Chen, 1992; Sy and Taylor, 2000)

Usage

cox_cure_net(
  surv_formula,
  cure_formula,
  time,
  event,
  data,
  subset,
  contrasts = NULL,
  surv_lambda = NULL,
  surv_alpha = 1,
  surv_nlambda = 10,
  surv_lambda_min_ratio = 0.1,
  surv_l1_penalty_factor = NULL,
  cure_lambda = NULL,
  cure_alpha = 1,
  cure_nlambda = 10,
  cure_lambda_min_ratio = 0.1,
  cure_l1_penalty_factor = NULL,
  cv_nfolds = 0,
  surv_start = NULL,
  cure_start = NULL,
  surv_offset = NULL,
  cure_offset = NULL,
  surv_standardize = TRUE,
  cure_standardize = TRUE,
  em_max_iter = 200,
  em_rel_tol = 1e-05,
  surv_max_iter = 10,
  surv_rel_tol = 1e-05,
  cure_max_iter = 10,
  cure_rel_tol = 1e-05,
  tail_completion = c("zero", "exp", "zero-tau"),
  tail_tau = NULL,
  pmin = 1e-05,
  early_stop = TRUE,
  verbose = FALSE,
  ...
)

cox_cure_net.fit(
  surv_x,
  cure_x,
  time,
  event,
  cure_intercept = TRUE,
  surv_lambda = NULL,
  surv_alpha = 1,
  surv_nlambda = 10,
  surv_lambda_min_ratio = 0.1,
  surv_l1_penalty_factor = NULL,
  cure_lambda = NULL,
  cure_alpha = 1,
  cure_nlambda = 10,
  cure_lambda_min_ratio = 0.1,
  cure_l1_penalty_factor = NULL,
  cv_nfolds = 0,
  surv_start = NULL,
  cure_start = NULL,
  surv_offset = NULL,
  cure_offset = NULL,
  surv_standardize = TRUE,
  cure_standardize = TRUE,
  em_max_iter = 200,
  em_rel_tol = 1e-05,
  surv_max_iter = 10,
  surv_rel_tol = 1e-05,
  cure_max_iter = 10,
  cure_rel_tol = 1e-05,
  tail_completion = c("zero", "exp", "zero-tau"),
  tail_tau = NULL,
  pmin = 1e-05,
  early_stop = TRUE,
  verbose = FALSE,
  ...
)

Arguments

surv_formula

A formula object starting with ~ for the model formula in survival model part. For Cox model, no intercept term is included even if an intercept is specified or implied in the model formula. A model formula with an intercept term only is not allowed.

cure_formula

A formula object starting with ~ for the model formula in incidence model part. For logistic model, an intercept term is included by default and can be excluded by adding + 0 or - 1 to the model formula.

time

A numeric vector for the observed survival times.

event

A numeric vector for the event indicators. NA's are allowed and represent uncertain event status.

data

An optional data frame, list, or environment that contains the covariates and response variables (time and event), included in the model. If they are not found in data, the variables are taken from the environment of the specified formula, usually the environment from which this function is called.

subset

An optional logical vector specifying a subset of observations to be used in the fitting process.

contrasts

An optional list, whose entries are values (numeric matrices or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors. See contrasts.arg of model.matrix.default for details.

surv_lambda, cure_lambda

A numeric vector consists of nonnegative values representing the tuning parameter sequence for the survival model part or the incidence model part.

surv_alpha, cure_alpha

A number between 0 and 1 for tuning the elastic net penalty for the survival model part or the incidence model part. If it is one, the elastic penalty will reduce to the well-known lasso penalty. If it is zero, the ridge penalty will be used.

surv_nlambda, cure_nlambda

A positive number specifying the number of surv_lambda or cure_lambda if surv_lambda or cure_lambda is not specified, respectively. The default value is 10.

surv_lambda_min_ratio, cure_lambda_min_ratio

The ratio of the minimum surv_lambda (or cure_lambda) to the large enough surv_lambda (or codecure_lambda) that produces all-zero estimates on log scale. The default value is 1e-1.

surv_l1_penalty_factor, cure_l1_penalty_factor

A numeric vector that consists of nonnegative penalty factors (or weights) on L1-norm for the coefficient estimate vector in the survival model part or the incidence model part. The penalty is applied to the coefficient estimate divided by the specified weights. The specified weights are re-scaled internally so that their summation equals the length of coefficients. If NULL is specified, the weights are all set to be one.

cv_nfolds

An non-negative integer specifying number of folds in cross-validation (CV). The default value is 0 and the CV procedure is not enabled.

surv_start

An optional numeric vector representing the starting values for the survival model component or the incidence model component. If surv_start = NULL is specified, the starting values will be obtained from fitting a regular Cox to events only. Similarly, if cure_start = NULL is specified, the starting values will be obtained from fitting a regular logistic model to the non-missing event indicators.

cure_start

An optional numeric vector representing the starting values for the survival model component or the incidence model component. If surv_start = NULL is specified, the starting values will be obtained from fitting a regular Cox to events only. Similarly, if cure_start = NULL is specified, the starting values will be obtained from fitting a regular logistic model to the non-missing event indicators.

surv_offset

An optional numeric vector representing the offset term in the survival model compoent or the incidence model component. The function will internally try to find values of the specified variable in the data first. Alternatively, one or more offset terms can be specified in the formula (by stats::offset()). If more than one offset terms are specified, their sum will be used.

cure_offset

An optional numeric vector representing the offset term in the survival model compoent or the incidence model component. The function will internally try to find values of the specified variable in the data first. Alternatively, one or more offset terms can be specified in the formula (by stats::offset()). If more than one offset terms are specified, their sum will be used.

surv_standardize, cure_standardize

A logical value specifying whether to standardize the covariates for the survival model part or the incidence model part. If FALSE, the covariates will be standardized internally to have mean zero and standard deviation one.

em_max_iter

A positive integer specifying the maximum iteration number of the EM algorithm. The default value is 200.

em_rel_tol

A positive number specifying the tolerance that determines the convergence of the EM algorithm in terms of the convergence of the covariate coefficient estimates. The tolerance is compared with the relative change between estimates from two consecutive iterations, which is measured by ratio of the L1-norm of their difference to the sum of their L1-norm. The default value is 1e-5.

surv_max_iter, cure_max_iter

A positive integer specifying the maximum iteration number of the M-step routine related to the survival model component or the incidence model component. The default value is 10 to encourage faster convergence.

surv_rel_tol

A positive number specifying the tolerance that determines the convergence of the M-step related to the survival model component or the incidence model component in terms of the convergence of the covariate coefficient estimates. The tolerance is compared with the relative change between estimates from two consecutive iterations, which is measured by ratio of the L1-norm of their difference to the sum of their L1-norm. The default value is 1e-5.

cure_rel_tol

A positive number specifying the tolerance that determines the convergence of the M-step related to the survival model component or the incidence model component in terms of the convergence of the covariate coefficient estimates. The tolerance is compared with the relative change between estimates from two consecutive iterations, which is measured by ratio of the L1-norm of their difference to the sum of their L1-norm. The default value is 1e-5.

tail_completion

A character string specifying the tail completion method for conditional survival function. The available methods are "zero" for zero-tail completion after the largest event times (Sy and Taylor, 2000), "exp" for exponential-tail completion (Peng, 2003), and "zero-tau" for zero-tail completion after a specified tail_tau. The default method is the zero-tail completion proposed by Sy and Taylor (2000).

tail_tau

A numeric number specifying the time of zero-tail completion. It will be used only if tail_completion = "zero-tau". A reasonable choice must be a time point between the largest event time and the largest survival time.

pmin

A numeric number specifying the minimum value of probabilities for sake of numerical stability. The default value is 1e-5.

early_stop

A logical value specifying whether to stop the iteration once the negative log-likelihood unexpectedly increases, which may suggest convergence on likelihood, or indicate numerical issues or implementation bugs. The default value is TRUE.

verbose

A logical value. If TRUE, a verbose information will be given along iterations for tracing the convergence. The default value is FALSE.

...

Other arguments for future usage. A warning will be thrown if any invalid argument is specified.

surv_x

A numeric matrix for the design matrix of the survival model component.

cure_x

A numeric matrix for the design matrix of the cure rate model component. The design matrix should exclude an intercept term unless we want to fit a model only including the intercept term. In that case, we need further set cure_intercept = FALSE to not standardize the intercept term.

cure_intercept

A logical value specifying whether to add an intercept term to the cure rate model component. If TRUE by default, an intercept term is included.

Details

The model estimation procedure follows expectation maximization (EM) algorithm. Variable selection procedure through regularization by elastic net penalty is developed based on cyclic coordinate descent and majorization-minimization (MM) algorithm.

Value

cox_cure_net object for regular Cox cure rate model or cox_cure_net_uncer object for Cox cure rate model with uncertain events.

References

Kuk, A. Y. C., & Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79(3), 531–541.

Masud, A., Tu, W., & Yu, Z. (2018). Variable selection for mixture and promotion time cure rate models. Statistical methods in medical research, 27(7), 2185–2199.

Peng, Y. (2003). Estimating baseline distribution in proportional hazards cure models. Computational Statistics & Data Analysis, 42(1-2), 187–201.

Sy, J. P., & Taylor, J. M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56(1), 227–236.

Wang, W., Luo, C., Aseltine, R. H., Wang, F., Yan, J., & Chen, K. (2020). Suicide Risk Modeling with Uncertain Diagnostic Records. arXiv preprint arXiv:2009.02597.

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

See Also

cox_cure for regular Cox cure rate model.

Examples

library(intsurv)

### regularized Cox cure rate model ==================================
## simulate a toy right-censored data with a cure fraction
set.seed(123)
n_obs <- 100
p <- 10
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
surv_beta <- c(rep(0, p - 5), rep(1, 5))
cure_beta <- c(rep(1, 2), rep(0, p - 2))
dat <- simData4cure(nSubject = n_obs, lambda_censor = 0.01,
                    max_censor = 10, survMat = x_mat,
                    survCoef = surv_beta, cureCoef = cure_beta,
                    b0 = 0.5, p1 = 1, p2 = 1, p3 = 1)

## model-fitting from given design matrices
fit1 <- cox_cure_net.fit(x_mat, x_mat, dat$obs_time, dat$obs_event,
                         surv_nlambda = 10, cure_nlambda = 10,
                         surv_alpha = 0.8, cure_alpha = 0.8)

## model-fitting from given model formula
fm <- paste(paste0("x", seq_len(p)), collapse = " + ")
surv_fm <- as.formula(sprintf("~ %s", fm))
cure_fm <- surv_fm
fit2 <- cox_cure_net(surv_fm, cure_fm, data = dat,
                     time = obs_time, event = obs_event,
                     surv_alpha = 0.5, cure_alpha = 0.5)

## summary of BIC's
BIC(fit1)
BIC(fit2)

## list of coefficient estimates based on BIC
coef(fit1)
coef(fit2)


### regularized Cox cure model with uncertain event status ===========
## simulate a toy data
set.seed(123)
n_obs <- 100
p <- 10
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
surv_beta <- c(rep(0, p - 5), rep(1, 5))
cure_beta <- c(rep(1, 2), rep(0, p - 2))
dat <- simData4cure(nSubject = n_obs, lambda_censor = 0.01,
                    max_censor = 10, survMat = x_mat,
                    survCoef = surv_beta, cureCoef = cure_beta,
                    b0 = 0.5, p1 = 0.95, p2 = 0.95, p3 = 0.95)

## model-fitting from given design matrices
fit1 <- cox_cure_net.fit(x_mat, x_mat, dat$obs_time, dat$obs_event,
                         surv_nlambda = 5, cure_nlambda = 5,
                         surv_alpha = 0.8, cure_alpha = 0.8)

## model-fitting from given model formula
fm <- paste(paste0("x", seq_len(p)), collapse = " + ")
surv_fm <- as.formula(sprintf("~ %s", fm))
cure_fm <- surv_fm
fit2 <- cox_cure_net(surv_fm, cure_fm, data = dat,
                     time = obs_time, event = obs_event,
                     surv_nlambda = 5, cure_nlambda = 5,
                     surv_alpha = 0.5, cure_alpha = 0.5)

## summary of BIC's
BIC(fit1)
BIC(fit2)

## list of coefficient estimates based on BIC
coef(fit1)
coef(fit2)

[Package intsurv version 0.2.2 Index]