cv.splsdrcox {Coxmos}R Documentation

sPLS-DRCOX Cross-Validation

Description

This function performs cross-validated sparse partial least squares DRCox (sPLS-DRCOX). The function returns the optimal number of components and the optimal sparsity penalty value based on cross-validation. The performance could be based on multiple metrics as Area Under the Curve (AUC), Brier Score or C-Index. Furthermore, the user could establish more than one metric simultaneously.

Usage

cv.splsdrcox(
  X,
  Y,
  max.ncomp = 8,
  penalty.list = seq(0.1, 0.9, 0.2),
  n_run = 3,
  k_folds = 10,
  x.center = TRUE,
  x.scale = FALSE,
  remove_near_zero_variance = TRUE,
  remove_zero_variance = TRUE,
  toKeep.zv = NULL,
  remove_variance_at_fold_level = FALSE,
  remove_non_significant_models = FALSE,
  remove_non_significant = FALSE,
  alpha = 0.05,
  w_AIC = 0,
  w_c.index = 0,
  w_AUC = 1,
  w_BRIER = 0,
  times = NULL,
  max_time_points = 15,
  MIN_AUC_INCREASE = 0.01,
  MIN_AUC = 0.8,
  MIN_COMP_TO_CHECK = 3,
  pred.attr = "mean",
  pred.method = "cenROC",
  fast_mode = FALSE,
  MIN_EPV = 5,
  return_models = FALSE,
  returnData = FALSE,
  PARALLEL = FALSE,
  verbose = FALSE,
  seed = 123
)

Arguments

X

Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables.

Y

Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations.

max.ncomp

Numeric. Maximum number of PLS components to compute for the cross validation (default: 8).

penalty.list

Numeric vector. Vector of penalty values. Penalty for sPLS-DRCOX. If penalty = 0 no penalty is applied, when penalty = 1 maximum penalty (no variables are selected) based on 'plsRcox' penalty. Equal or greater than 1 cannot be selected (default: seq(0.1,0.9,0.2)).

n_run

Numeric. Number of runs for cross validation (default: 3).

k_folds

Numeric. Number of folds for cross validation (default: 10).

x.center

Logical. If x.center = TRUE, X matrix is centered to zero means (default: TRUE).

x.scale

Logical. If x.scale = TRUE, X matrix is scaled to unit variances (default: FALSE).

remove_near_zero_variance

Logical. If remove_near_zero_variance = TRUE, near zero variance variables will be removed (default: TRUE).

remove_zero_variance

Logical. If remove_zero_variance = TRUE, zero variance variables will be removed (default: TRUE).

toKeep.zv

Character vector. Name of variables in X to not be deleted by (near) zero variance filtering (default: NULL).

remove_variance_at_fold_level

Logical. If remove_variance_at_fold_level = TRUE, (near) zero variance will be removed at fold level (default: FALSE).

remove_non_significant_models

Logical. If remove_non_significant_models = TRUE, non-significant models are removed before computing the evaluation. A non-significant model is a model with at least one component/variable with a P-Value higher than the alpha cutoff.

remove_non_significant

Logical. If remove_non_significant = TRUE, non-significant variables/components in final cox model will be removed until all variables are significant by forward selection (default: FALSE).

alpha

Numeric. Numerical values are regarded as significant if they fall below the threshold (default: 0.05).

w_AIC

Numeric. Weight for AIC evaluator. All weights must sum 1 (default: 0).

w_c.index

Numeric. Weight for C-Index evaluator. All weights must sum 1 (default: 0).

w_AUC

Numeric. Weight for AUC evaluator. All weights must sum 1 (default: 1).

w_BRIER

Numeric. Weight for BRIER SCORE evaluator. All weights must sum 1 (default: 0).

times

Numeric vector. Time points where the AUC will be evaluated. If NULL, a maximum of 'max_time_points' points will be selected equally distributed (default: NULL).

max_time_points

Numeric. Maximum number of time points to use for evaluating the model (default: 15).

MIN_AUC_INCREASE

Numeric. Minimum improvement between different cross validation models to continue evaluating higher values in the multiple tested parameters. If it is not reached for next 'MIN_COMP_TO_CHECK' models and the minimum 'MIN_AUC' value is reached, the evaluation stops (default: 0.01).

MIN_AUC

Numeric. Minimum AUC desire to reach cross-validation models. If the minimum is reached, the evaluation could stop if the improvement does not reach an AUC higher than adding the 'MIN_AUC_INCREASE' value (default: 0.8).

MIN_COMP_TO_CHECK

Numeric. Number of penalties/components to evaluate to check if the AUC improves. If for the next 'MIN_COMP_TO_CHECK' the AUC is not better and the 'MIN_AUC' is meet, the evaluation could stop (default: 3).

pred.attr

Character. Way to evaluate the metric selected. Must be one of the following: "mean" or "median" (default: "mean").

pred.method

Character. AUC evaluation algorithm method for evaluate the model performance. Must be one of the following: "risksetROC", "survivalROC", "cenROC", "nsROC", "smoothROCtime_C", "smoothROCtime_I" (default: "cenROC").

fast_mode

Logical. If fast_mode = TRUE, for each run, only one fold is evaluated simultaneously. If fast_mode = FALSE, for each run, all linear predictors are computed for test observations. Once all have their linear predictors, the evaluation is perform across all the observations together (default: FALSE).

MIN_EPV

Numeric. Minimum number of Events Per Variable (EPV) you want reach for the final cox model. Used to restrict the number of variables/components can be computed in final cox models. If the minimum is not meet, the model cannot be computed (default: 5).

return_models

Logical. Return all models computed in cross validation (default: FALSE).

returnData

Logical. Return original and normalized X and Y matrices (default: TRUE).

PARALLEL

Logical. Run the cross validation with multicore option. As many cores as your total cores - 1 will be used. It could lead to higher RAM consumption (default: FALSE).

verbose

Logical. If verbose = TRUE, extra messages could be displayed (default: FALSE).

seed

Number. Seed value for performing runs/folds divisions (default: 123).

Details

The ⁠sPLS-DRCOX Cross-Validation⁠ function offers a robust approach to fine-tune the hyperparameters of the sPLS-DRCOX model, ensuring optimal performance in survival analysis tasks. By systematically evaluating different combinations of hyperparameters, this function identifies the best model configuration that minimizes prediction error.

Cross-validation is a crucial step in survival analysis, especially when dealing with high-dimensional datasets. It provides an unbiased assessment of the model's generalization capability, safeguarding against overfitting. This function employs a k-fold cross-validation strategy, partitioning the data into multiple subsets (folds) and iteratively using each fold as a test set while the remaining folds serve as training data.

One of the primary strengths of this function is its flexibility. Users can specify a range of values for the number of PLS components and the penalty parameter penalty. The function then evaluates all possible combinations, returning the optimal configuration that yields the best predictive performance.

Additionally, the function offers advanced features like parallel processing for faster computation, and the ability to return all models from the cross-validation process. This is particularly useful for in-depth analysis and comparisons.

The output provides comprehensive insights, including performance metrics for each fold, run, and hyperparameter combination. Visualization plots like AIC, C-Index, Brier Score, and AUC plots further aid in understanding the model's performance across different configurations.

Value

Instance of class "Coxmos" and model "cv.sPLS-DRCOX". best_model_info: A data.frame with the information for the best model. df_results_folds: A data.frame with fold-level information. df_results_runs: A data.frame with run-level information. df_results_comps: A data.frame with component-level information (for cv.coxEN, EN.alpha information).

lst_models: If return_models = TRUE, return a the list of all cross-validated models. pred.method: AUC evaluation algorithm method for evaluate the model performance.

opt.comp: Optimal component selected by the best_model. opt.penalty: Optimal penalty/penalty selected by the best_model. opt.nvar: Optimal number of variables selected by the best_model.

plot_AIC: AIC plot by each hyper-parameter. plot_c_index: C-Index plot by each hyper-parameter. plot_BRIER: Brier Score plot by each hyper-parameter. plot_AUC: AUC plot by each hyper-parameter.

class: Cross-Validated model class.

lst_train_indexes: List (of lists) of indexes for the observations used in each run/fold for train the models. lst_test_indexes: List (of lists) of indexes for the observations used in each run/fold for test the models.

time: time consumed for running the cross-validated function.

Author(s)

Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es

Examples

data("X_proteomic")
data("Y_proteomic")
set.seed(123)
index_train <- caret::createDataPartition(Y_proteomic$event, p = .5, list = FALSE, times = 1)
X_train <- X_proteomic[index_train,1:50]
Y_train <- Y_proteomic[index_train,]
cv.splsdrcox_model <- cv.splsdrcox(X_train, Y_train, max.ncomp = 2, penalty.list = c(0.1),
n_run = 1, k_folds = 2, x.center = TRUE, x.scale = TRUE)

[Package Coxmos version 1.0.2 Index]