cv_cureem {hdcuremodels} | R Documentation |
Fit penalized mixture cure model using the E-M algorithm with cross-validation for parameter tuning
Description
Fits a penalized parametric and semi-parametric mixture cure model (MCM) using the E-M algorithm with with k-fold cross-validation for parameter tuning. The lasso (L1), MCP and SCAD penalty are supported for the Cox MCM while only lasso is currently supported for parametric MCMs. When FDR controlled variable selection is used, the model-X knockoffs method is applied and indices of selected variables are returned.
Usage
cv_cureem(
formula,
data,
subset,
x.latency = NULL,
model = "cox",
penalty = "lasso",
penalty.factor.inc = NULL,
penalty.factor.lat = NULL,
fdr.control = FALSE,
fdr = 0.2,
grid.tuning = FALSE,
thresh = 0.001,
scale = TRUE,
maxit = NULL,
inits = NULL,
lambda.inc.list = NULL,
lambda.lat.list = NULL,
nlambda.inc = NULL,
nlambda.lat = NULL,
gamma.inc = 3,
gamma.lat = 3,
lambda.min.ratio.inc = 0.1,
lambda.min.ratio.lat = 0.1,
n_folds = 5,
measure.inc = "c",
one.se = FALSE,
cure_cutoff = 5,
parallel = FALSE,
seed = NULL,
verbose = TRUE,
...
)
Arguments
formula |
an object of class " |
data |
a data.frame in which to interpret the variables named in the |
subset |
an optional expression indicating which subset of observations to be used in the fitting process, either a numeric or factor variable should be used in subset, not a character variable. All observations are included by default. |
x.latency |
specifies the variables to be included in the latency portion of the model and can be either a matrix of predictors, a model formula with the right hand side specifying the latency variables, or the same data.frame passed to the |
model |
type of regression model to use for the latency portion of mixture cure model. Can be "cox", "weibull", or "exponential" (default is "cox"). |
penalty |
type of penalty function. Can be "lasso", "MCP", or "SCAD" (default is "lasso"). |
penalty.factor.inc |
vector of binary indicators representing the penalty to apply to each incidence coefficient: 0 implies no shrinkage and 1 implies shrinkage. If not supplied, 1 is applied to all incidence variables. |
penalty.factor.lat |
vector of binary indicators representing the penalty to apply to each latency coefficient: 0 implies no shrinkage and 1 implies shrinkage. If not supplied, 1 is applied to all latency variables. |
fdr.control |
logical, if TRUE, model-X knockoffs are used for FDR-controlled variable selection and indices of selected variables are returned (default is FALSE). |
fdr |
numeric value in (0, 1) range specifying the target FDR level to use for variable selection when |
grid.tuning |
logical, if TRUE a 2-D grid tuning approach is used to select the optimal pair of |
thresh |
small numeric value. The iterative process stops when the differences between successive expected penalized complete-data log-likelihoods for both incidence and latency components are less than this specified level of tolerance (default is 10^-3). |
scale |
logical, if TRUE the predictors are centered and scaled. |
maxit |
maximum number of passes over the data for each lambda. If not specified, 100 is applied when |
inits |
an optional list specifiying the initial value for the incidence intercept ( |
lambda.inc.list |
a numeric vector used to search for the optimal |
lambda.lat.list |
a numeric vector used to search for the optimal |
nlambda.inc |
an integer specifying the number of values to search for the optimal |
nlambda.lat |
an integer specifying the number of values to search for the optimal |
gamma.inc |
numeric value for the penalization parameter |
gamma.lat |
numeric value for the penalization parameter |
lambda.min.ratio.inc |
numeric value in (0,1) representing the smallest value for |
lambda.min.ratio.lat |
numeric value in (0.1) representing the smallest value for |
n_folds |
an integer specifying the number of folds for the k-fold cross-valiation procedure (default is 5). |
measure.inc |
character string specifying the evaluation criterion used in selecting the optimal |
one.se |
logical, if TRUE then the one standard error rule is applied for selecting the optimal parameters. The one standard error rule selects the most parsimonious model having evaluation criterion no more than one standard error worse than that of the best evaluation criterion (default is FALSE). |
cure_cutoff |
numeric value representing the cutoff time value that represents subjects not experiencing the event by this time are cured. This value is used to produce a proxy for the unobserved cure status when calculating C-statistic and AUC (default is 5 representing 5 years). Users should be careful to note the time scale of their data and adjust this according to the time scale and clinical application. |
parallel |
logical. If TRUE, parallel processing is performed for K-fold CV using |
seed |
optional integer representing the random seed. Setting the random seed fosters reproducibility of the results. |
verbose |
logical, if TRUE running information is printed to the console (default is FALSE). |
... |
additional arguments. |
Value
b0 |
Estimated intercept for the incidence portion of the model. |
b |
Estimated coefficients for the incidence portion of the model. |
beta |
Estimated coefficients for the latency portion of the model. |
alpha |
Estimated shape parameter if the Weibull model is fit. |
rate |
Estimated rate parameter if the Weibull or exponential model is fit. |
logLik.inc |
Expected penalized complete-data log-likelihood for the incidence portion of the model. |
logLik.lat |
Expected penalized complete-data log-likelihood for the latency portion of the model. |
selected.lambda.inc |
Value of |
selected.lambda.lat |
Value of |
max.c |
Maximum C-statistic achieved. |
max.auc |
Maximum AUC for cure prediction achieved; only output when |
selected.index.inc |
Indices of selected variables for the incidence portion of the model when |
selected.index.lat |
Indices of selected variables for the latency portion of the model when |
call |
the matched call. |
References
Archer, K. J., Fu, H., Mrozek, K., Nicolet, D., Mims, A. S., Uy, G. L., Stock, W., Byrd, J. C., Hiddemann, W., Braess, J., Spiekermann, K., Metzeler, K. H., Herold, T., Eisfeld, A.-K. (2024) Identifying long-term survivors and those at higher or lower risk of relapse among patients with cytogenetically normal acute myeloid leukemia using a high-dimensional mixture cure model. Journal of Hematology & Oncology, 17:28.
See Also
Examples
library(survival)
set.seed(1234)
temp <- generate_cure_data(N = 200, J = 25, nTrue = 5, A = 1.8)
training <- temp$Training
fit.cv <- cv_cureem(Surv(Time, Censor) ~ ., data = training,
x.latency = training, fdr.control = FALSE,
grid.tuning = FALSE, nlambda.inc = 10, nlambda.lat = 10,
n_folds = 2, seed = 23, verbose = TRUE)
fit.cv.fdr <- cv_cureem(Surv(Time, Censor) ~ ., data = training,
x.latency = training, model = "weibull", penalty = "lasso",
fdr.control = TRUE, grid.tuning = FALSE, nlambda.inc = 10,
nlambda.lat = 10, n_folds = 2, seed = 23, verbose = TRUE)