cv.sb.splsdrcox {Coxmos} | R Documentation |
SB.sPLS-DRCOX Cross-Validation
Description
This function performs cross-validated sparse partial least squares single block for splsdrcox. The function returns the optimal number of components and the optimal sparsity penalty value based on cross-validation. The performance could be based on multiple metrics as Area Under the Curve (AUC), Brier Score or C-Index. Furthermore, the user could establish more than one metric simultaneously.
Usage
cv.sb.splsdrcox(
X,
Y,
max.ncomp = 8,
penalty.list = seq(0.1, 0.9, 0.2),
n_run = 3,
k_folds = 10,
x.center = TRUE,
x.scale = FALSE,
remove_near_zero_variance = TRUE,
remove_zero_variance = TRUE,
toKeep.zv = NULL,
remove_variance_at_fold_level = FALSE,
remove_non_significant_models = FALSE,
remove_non_significant = FALSE,
alpha = 0.05,
w_AIC = 0,
w_c.index = 0,
w_AUC = 1,
w_BRIER = 0,
times = NULL,
max_time_points = 15,
MIN_AUC_INCREASE = 0.01,
MIN_AUC = 0.8,
MIN_COMP_TO_CHECK = 3,
pred.attr = "mean",
pred.method = "cenROC",
fast_mode = FALSE,
MIN_EPV = 5,
return_models = FALSE,
returnData = FALSE,
PARALLEL = FALSE,
verbose = FALSE,
seed = 123
)
Arguments
X |
Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables. |
Y |
Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations. |
max.ncomp |
Numeric. Maximum number of PLS components to compute for the cross validation (default: 8). |
penalty.list |
Numeric vector. Vector of penalty values. Penalty for sPLS-DRCOX. If penalty = 0 no penalty is applied, when penalty = 1 maximum penalty (no variables are selected) based on 'plsRcox' penalty. Equal or greater than 1 cannot be selected (default: seq(0.1,0.9,0.2)). |
n_run |
Numeric. Number of runs for cross validation (default: 3). |
k_folds |
Numeric. Number of folds for cross validation (default: 10). |
x.center |
Logical. If x.center = TRUE, X matrix is centered to zero means (default: TRUE). |
x.scale |
Logical. If x.scale = TRUE, X matrix is scaled to unit variances (default: FALSE). |
remove_near_zero_variance |
Logical. If remove_near_zero_variance = TRUE, near zero variance variables will be removed (default: TRUE). |
remove_zero_variance |
Logical. If remove_zero_variance = TRUE, zero variance variables will be removed (default: TRUE). |
toKeep.zv |
Character vector. Name of variables in X to not be deleted by (near) zero variance filtering (default: NULL). |
remove_variance_at_fold_level |
Logical. If remove_variance_at_fold_level = TRUE, (near) zero variance will be removed at fold level (default: FALSE). |
remove_non_significant_models |
Logical. If remove_non_significant_models = TRUE, non-significant models are removed before computing the evaluation. A non-significant model is a model with at least one component/variable with a P-Value higher than the alpha cutoff. |
remove_non_significant |
Logical. If remove_non_significant = TRUE, non-significant variables/components in final cox model will be removed until all variables are significant by forward selection (default: FALSE). |
alpha |
Numeric. Numerical values are regarded as significant if they fall below the threshold (default: 0.05). |
w_AIC |
Numeric. Weight for AIC evaluator. All weights must sum 1 (default: 0). |
w_c.index |
Numeric. Weight for C-Index evaluator. All weights must sum 1 (default: 0). |
w_AUC |
Numeric. Weight for AUC evaluator. All weights must sum 1 (default: 1). |
w_BRIER |
Numeric. Weight for BRIER SCORE evaluator. All weights must sum 1 (default: 0). |
times |
Numeric vector. Time points where the AUC will be evaluated. If NULL, a maximum of 'max_time_points' points will be selected equally distributed (default: NULL). |
max_time_points |
Numeric. Maximum number of time points to use for evaluating the model (default: 15). |
MIN_AUC_INCREASE |
Numeric. Minimum improvement between different cross validation models to continue evaluating higher values in the multiple tested parameters. If it is not reached for next 'MIN_COMP_TO_CHECK' models and the minimum 'MIN_AUC' value is reached, the evaluation stops (default: 0.01). |
MIN_AUC |
Numeric. Minimum AUC desire to reach cross-validation models. If the minimum is reached, the evaluation could stop if the improvement does not reach an AUC higher than adding the 'MIN_AUC_INCREASE' value (default: 0.8). |
MIN_COMP_TO_CHECK |
Numeric. Number of penalties/components to evaluate to check if the AUC improves. If for the next 'MIN_COMP_TO_CHECK' the AUC is not better and the 'MIN_AUC' is meet, the evaluation could stop (default: 3). |
pred.attr |
Character. Way to evaluate the metric selected. Must be one of the following: "mean" or "median" (default: "mean"). |
pred.method |
Character. AUC evaluation algorithm method for evaluate the model performance. Must be one of the following: "risksetROC", "survivalROC", "cenROC", "nsROC", "smoothROCtime_C", "smoothROCtime_I" (default: "cenROC"). |
fast_mode |
Logical. If fast_mode = TRUE, for each run, only one fold is evaluated simultaneously. If fast_mode = FALSE, for each run, all linear predictors are computed for test observations. Once all have their linear predictors, the evaluation is perform across all the observations together (default: FALSE). |
MIN_EPV |
Numeric. Minimum number of Events Per Variable (EPV) you want reach for the final cox model. Used to restrict the number of variables/components can be computed in final cox models. If the minimum is not meet, the model cannot be computed (default: 5). |
return_models |
Logical. Return all models computed in cross validation (default: FALSE). |
returnData |
Logical. Return original and normalized X and Y matrices (default: TRUE). |
PARALLEL |
Logical. Run the cross validation with multicore option. As many cores as your total cores - 1 will be used. It could lead to higher RAM consumption (default: FALSE). |
verbose |
Logical. If verbose = TRUE, extra messages could be displayed (default: FALSE). |
seed |
Number. Seed value for performing runs/folds divisions (default: 123). |
Details
The cv.sb.splsdrcox
function performs cross-validation for the single-block sparse partial least
squares deviance residual Cox analysis. Cross-validation is a robust method to evaluate the
performance of a statistical model by partitioning the original sample into a training set to train
the model, and a test set to evaluate it. This helps in selecting the optimal hyperparameters for
the model, such as the number of latent components (max.ncomp
) and the penalty for variable
selection (penalty.list
).
The function systematically evaluates different combinations of hyperparameters by performing
multiple runs and folds. For each combination, the dataset is divided into training and test sets
based on the specified number of folds (k_folds
). The model is then trained on the training set
and evaluated on the test set. This process is repeated for the specified number of runs (n_run
),
ensuring a comprehensive evaluation of the model's performance across different partitions of the
data.
Various evaluation metrics, such as AIC, C-Index, Brier Score, and AUC, are computed for each combination of hyperparameters. These metrics provide insights into the model's accuracy, discriminative ability, and calibration. The function then identifies the optimal hyperparameters that yield the best performance based on the specified evaluation metrics.
The function also offers flexibility in data preprocessing, such as centering and scaling of the
explanatory variables, removal of near-zero variance variables, and more. Additionally, users can
specify the AUC evaluation algorithm method (pred.method
) and control the verbosity of the
output (verbose
).
The output provides a comprehensive overview of the cross-validation results, including detailed information at the fold, run, and component levels. Visualization tools, such as plots for AIC, C-Index, Brier Score, and AUC, are also provided to aid in understanding the model's performance across different hyperparameters.
In summary, the cv.sb.splsdrcox
function offers a robust approach for hyperparameter tuning and
model evaluation for the single-block sparse partial least squares deviance residual Cox analysis.
It ensures that the final model is both accurate and generalizable to new data.
Value
Instance of class "Coxmos" and model "cv.SB.sPLS-DRCOX".
best_model_info
: A data.frame with the information for the best model.
df_results_folds
: A data.frame with fold-level information.
df_results_runs
: A data.frame with run-level information.
df_results_comps
: A data.frame with component-level information (for cv.coxEN, EN.alpha
information).
lst_models
: If return_models = TRUE, return a the list of all cross-validated models.
pred.method
: AUC evaluation algorithm method for evaluate the model performance.
opt.comp
: Optimal component selected by the best_model.
opt.penalty
: Optimal penalty/penalty selected by the best_model.
opt.nvar
: Optimal number of variables selected by the best_model.
plot_AIC
: AIC plot by each hyper-parameter.
plot_c_index
: C-Index plot by each hyper-parameter.
plot_BRIER
: Brier Score plot by each hyper-parameter.
plot_AUC
: AUC plot by each hyper-parameter.
class
: Cross-Validated model class.
lst_train_indexes
: List (of lists) of indexes for the observations used in each run/fold
for train the models.
lst_test_indexes
: List (of lists) of indexes for the observations used in each run/fold
for test the models.
time
: time consumed for running the cross-validated function.
Author(s)
Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es
Examples
data("X_multiomic")
data("Y_multiomic")
set.seed(123)
index_train <- caret::createDataPartition(Y_multiomic$event, p = .5, list = FALSE, times = 1)
X_train <- X_multiomic
X_train$mirna <- X_train$mirna[index_train,1:50]
X_train$proteomic <- X_train$proteomic[index_train,1:50]
Y_train <- Y_multiomic[index_train,]
cv.sb.splsdrcox_model <- cv.sb.splsdrcox(X_train, Y_train, max.ncomp = 2, penalty.list = c(0.5),
n_run = 1, k_folds = 2, x.center = TRUE, x.scale = TRUE)