cv.isb.splsicox {Coxmos} | R Documentation |
Cross validation cv.isb.splsicox
Description
This function performs cross-validated sparse partial least squares iterative single block for splsicox. The function returns the optimal number of components and the optimal sparsity penalty value based on cross-validation. The performance could be based on multiple metrics as Area Under the Curve (AUC), Brier Score or C-Index. Furthermore, the user could establish more than one metric simultaneously.
Usage
cv.isb.splsicox(
X,
Y,
max.ncomp = 8,
penalty.list = seq(0.1, 0.9, 0.2),
n_run = 3,
k_folds = 10,
x.center = TRUE,
x.scale = FALSE,
remove_near_zero_variance = TRUE,
remove_zero_variance = TRUE,
toKeep.zv = NULL,
remove_variance_at_fold_level = FALSE,
remove_non_significant_models = FALSE,
remove_non_significant = FALSE,
alpha = 0.05,
w_AIC = 0,
w_c.index = 0,
w_AUC = 1,
w_BRIER = 0,
times = NULL,
max_time_points = 15,
MIN_AUC_INCREASE = 0.01,
MIN_AUC = 0.8,
MIN_COMP_TO_CHECK = 3,
pred.attr = "mean",
pred.method = "cenROC",
fast_mode = FALSE,
MIN_EPV = 5,
returnData = TRUE,
return_models = FALSE,
PARALLEL = FALSE,
verbose = FALSE,
seed = 123
)
Arguments
X |
Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables. |
Y |
Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations. |
max.ncomp |
Numeric. Maximum number of PLS components to compute for the cross validation (default: 8). |
penalty.list |
Numeric vector. Penalty for variable selection for the individual cox models. Variables with a lower P-Value than 1- "penalty" in the individual cox analysis will be keep for the sPLS-ICOX approach (default: seq(0.1,0.9,0.2)). |
n_run |
Numeric. Number of runs for cross validation (default: 3). |
k_folds |
Numeric. Number of folds for cross validation (default: 10). |
x.center |
Logical. If x.center = TRUE, X matrix is centered to zero means (default: TRUE). |
x.scale |
Logical. If x.scale = TRUE, X matrix is scaled to unit variances (default: FALSE). |
remove_near_zero_variance |
Logical. If remove_near_zero_variance = TRUE, near zero variance variables will be removed (default: TRUE). |
remove_zero_variance |
Logical. If remove_zero_variance = TRUE, zero variance variables will be removed (default: TRUE). |
toKeep.zv |
Character vector. Name of variables in X to not be deleted by (near) zero variance filtering (default: NULL). |
remove_variance_at_fold_level |
Logical. If remove_variance_at_fold_level = TRUE, (near) zero variance will be removed at fold level (default: FALSE). |
remove_non_significant_models |
Logical. If remove_non_significant_models = TRUE, non-significant models are removed before computing the evaluation. |
remove_non_significant |
Logical. If remove_non_significant = TRUE, non-significant variables/components in final cox model will be removed until all variables are significant by forward selection (default: FALSE). |
alpha |
Numeric. Numerical values are regarded as significant if they fall below the threshold (default: 0.05). |
w_AIC |
Numeric. Weight for AIC evaluator. All weights must sum 1 (default: 0). |
w_c.index |
Numeric. Weight for C-Index evaluator. All weights must sum 1 (default: 0). |
w_AUC |
Numeric. Weight for AUC evaluator. All weights must sum 1 (default: 1). |
w_BRIER |
Numeric. Weight for BRIER SCORE evaluator. All weights must sum 1 (default: 0). |
times |
Numeric vector. Time points where the AUC will be evaluated. If NULL, a maximum of 'max_time_points' points will be selected equally distributed (default: NULL). |
max_time_points |
Numeric. Maximum number of time points to use for evaluating the model (default: 15). |
MIN_AUC_INCREASE |
Numeric. Minimum improvement between different cross validation models to continue evaluating higher values in the multiple tested parameters. If it is not reached for next 'MIN_COMP_TO_CHECK' models and the minimum 'MIN_AUC' value is reached, the evaluation stops (default: 0.01). |
MIN_AUC |
Numeric. Minimum AUC desire to reach cross-validation models. If the minimum is reached, the evaluation could stop if the improvement does not reach an AUC higher than adding the 'MIN_AUC_INCREASE' value (default: 0.8). |
MIN_COMP_TO_CHECK |
Numeric. Number of penalties/components to evaluate to check if the AUC improves. If for the next 'MIN_COMP_TO_CHECK' the AUC is not better and the 'MIN_AUC' is meet, the evaluation could stop (default: 3). |
pred.attr |
Character. Way to evaluate the metric selected. Must be one of the following: "mean" or "median" (default: "mean"). |
pred.method |
Character. AUC evaluation algorithm method for evaluate the model performance. Must be one of the following: "risksetROC", "survivalROC", "cenROC", "nsROC", "smoothROCtime_C", "smoothROCtime_I" (default: "cenROC"). |
fast_mode |
Logical. If fast_mode = TRUE, for each run, only one fold is evaluated simultaneously. If fast_mode = FALSE, for each run, all linear predictors are computed for test observations. Once all have their linear predictors, the evaluation is perform across all the observations together (default: FALSE). |
MIN_EPV |
Numeric. Minimum number of Events Per Variable (EPV) you want reach for the final cox model. Used to restrict the number of variables/components can be computed in final cox models. If the minimum is not meet, the model cannot be computed (default: 5). |
returnData |
Logical. Return original and normalized X and Y matrices (default: TRUE). |
return_models |
Logical. Return all models computed in cross validation (default: FALSE). |
PARALLEL |
Logical. Run the cross validation with multicore option. As many cores as your total cores - 1 will be used. It could lead to higher RAM consumption (default: FALSE). |
verbose |
Logical. If verbose = TRUE, extra messages could be displayed (default: FALSE). |
seed |
Number. Seed value for performing runs/folds divisions (default: 123). |
Details
The cv.isb.splsicox
function performs cross-validation for the iterative single-block sparse
partial least squares individual Cox analysis. Unlike the single-block (sb
) approach, where
each block is analyzed with the same number of components and penalties, the iterative
single-block (isb
) approach allows for the specification of different numbers of components and
penalties for each block. This provides a more tailored analysis for each block, recognizing that
different blocks may have varying complexities and relationships with the outcome.
The function is designed to handle datasets with multiple blocks, processing each block individually in an iterative manner. This ensures a detailed examination of each block's contribution to the survival outcome without the interference of other blocks. This approach is distinct from multiblock methods where all blocks are analyzed simultaneously.
The cross-validation process involves partitioning the dataset into multiple subsets (folds) and then iteratively training the model on a subset of the data while validating it on the remaining data. This helps in determining the optimal hyperparameters for the model, such as the number of latent components and the penalty for variable selection.
Unlike the sb
approach, which returns the optimal hyperparameters for further model training,
the isb
approach directly returns the final model. This model is constructed using the
best-performing hyperparameters for each block, ensuring a more customized and potentially more
accurate model.
The function offers flexibility in specifying various hyperparameters and options for data preprocessing. The output provides a comprehensive overview of the cross-validation results, including metrics like AIC, C-Index, Brier Score, and AUC for each hyper-parameter combination. Visualization tools are also provided to aid in understanding the model's performance across different hyperparameters.
Value
Instance of class "Coxmos" and model "sb.splsicox". The class contains the following
elements:
X
: List of normalized X data information.
-
(data)
: normalized X matrix -
(weightings)
: PLS weights -
(weightings_norm)
: PLS normalize weights -
(W.star)
: PLS W* vector -
(scores)
: PLS scores/variates -
(x.mean)
: mean values for X matrix -
(x.sd)
: standard deviation for X matrix
Y
: List of normalized Y data information.
-
(deviance_residuals)
: deviance residual vector used as Y matrix in the sPLS. -
(dr.mean)
: mean values for deviance residuals Y matrix -
(dr.sd)
: standard deviation for deviance residuals Y matrix' -
(data)
: normalized X matrix -
(y.mean)
: mean values for Y matrix -
(y.sd)
: standard deviation for Y matrix'
survival_model
: List of survival model information.
-
fit
: coxph object. -
AIC
: AIC of cox model. -
BIC
: BIC of cox model. -
lp
: linear predictors for train data. -
coef
: Coefficients for cox model. -
YChapeau
: Y Chapeau residuals. -
Yresidus
: Y residuals.
list_spls_models
: List of sPLS-ICOX models computed for each block.
n.comp
: Number of components selected.
penalty
Penalty applied.
call
: call function
X_input
: X input matrix
Y_input
: Y input matrix
nzv
: Variables removed by remove_near_zero_variance or remove_zero_variance.
nz_coeffvar
: Variables removed by coefficient variation near zero.
class
: Model class.
time
: time consumed for running the cox analysis.
Author(s)
Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es
Examples
data("X_multiomic")
data("Y_multiomic")
set.seed(123)
index_train <- caret::createDataPartition(Y_multiomic$event, p = .5, list = FALSE, times = 1)
X_train <- X_multiomic
X_train$mirna <- X_train$mirna[index_train,1:20]
X_train$proteomic <- X_train$proteomic[index_train,1:20]
Y_train <- Y_multiomic[index_train,]
isb.splsicox_model <- cv.isb.splsicox(X_train, Y_train, max.ncomp = 1, penalty.list = c(0.5),
n_run = 1, k_folds = 2, x.center = TRUE, x.scale = TRUE)