R: sPLSDA-COX Dynamic

splsdacox_dynamic {Coxmos}

R Documentation

sPLSDA-COX Dynamic

Description

The splsdacox_dynamic function conducts a sparse partial least squares discriminant analysis Cox (sPLSDA-COX) using dynamic variable selection methodology. This method is particularly useful for high-dimensional survival data where the goal is to identify a subset of variables that are most predictive of survival outcomes. The function integrates the power of sPLSDA with the Cox proportional hazards model to provide a robust tool for survival analysis in the context of large datasets.

Usage

splsdacox_dynamic(
  X,
  Y,
  n.comp = 4,
  vector = NULL,
  MIN_NVAR = 10,
  MAX_NVAR = 1000,
  n.cut_points = 5,
  MIN_AUC_INCREASE = 0.01,
  x.center = TRUE,
  x.scale = FALSE,
  remove_near_zero_variance = TRUE,
  remove_zero_variance = TRUE,
  toKeep.zv = NULL,
  remove_non_significant = FALSE,
  alpha = 0.05,
  EVAL_METHOD = "AUC",
  pred.method = "cenROC",
  max.iter = 200,
  times = NULL,
  max_time_points = 15,
  MIN_EPV = 5,
  returnData = TRUE,
  verbose = FALSE
)

Arguments

`X`	Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables.
`Y`	Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations.
`n.comp`	Numeric. Number of latent components to compute for the (s)PLS model (default: 10).
`vector`	Numeric vector. Used for computing best number of variables. As many values as components have to be provided. If vector = NULL, an automatic detection is perform (default: NULL).
`MIN_NVAR`	Numeric. Minimum range size for computing cut points to select the best number of variables to use (default: 10).
`MAX_NVAR`	Numeric. Maximum range size for computing cut points to select the best number of variables to use (default: 1000).
`n.cut_points`	Numeric. Number of cut points for searching the optimal number of variables. If only two cut points are selected, minimum and maximum size are used. For MB approaches as many as n.cut_points^n.blocks models will be computed as minimum (default: 5).
`MIN_AUC_INCREASE`	Numeric. Minimum improvement between different cross validation models to continue evaluating higher values in the multiple tested parameters. If it is not reached for next 'MIN_COMP_TO_CHECK' models and the minimum 'MIN_AUC' value is reached, the evaluation stops (default: 0.01).
`x.center`	Logical. If x.center = TRUE, X matrix is centered to zero means (default: TRUE).
`x.scale`	Logical. If x.scale = TRUE, X matrix is scaled to unit variances (default: FALSE).
`remove_near_zero_variance`	Logical. If remove_near_zero_variance = TRUE, near zero variance variables will be removed (default: TRUE).
`remove_zero_variance`	Logical. If remove_zero_variance = TRUE, zero variance variables will be removed (default: TRUE).
`toKeep.zv`	Character vector. Name of variables in X to not be deleted by (near) zero variance filtering (default: NULL).
`remove_non_significant`	Logical. If remove_non_significant = TRUE, non-significant variables/components in final cox model will be removed until all variables are significant by forward selection (default: FALSE).
`alpha`	Numeric. Numerical values are regarded as significant if they fall below the threshold (default: 0.05).
`EVAL_METHOD`	Character. If EVAL_METHOD = "AUC", AUC metric will be use to compute the best number of variables. In other case, c-index metric will be used (default: "AUC").
`pred.method`	Character. AUC evaluation algorithm method for evaluate the model performance. Must be one of the following: "risksetROC", "survivalROC", "cenROC", "nsROC", "smoothROCtime_C", "smoothROCtime_I" (default: "cenROC").
`max.iter`	Numeric. Maximum number of iterations for PLS convergence (default: 200).
`times`	Numeric vector. Time points where the AUC will be evaluated. If NULL, a maximum of 'max_time_points' points will be selected equally distributed (default: NULL).
`max_time_points`	Numeric. Maximum number of time points to use for evaluating the model (default: 15).
`MIN_EPV`	Numeric. Minimum number of Events Per Variable (EPV) you want reach for the final cox model. Used to restrict the number of variables/components can be computed in final cox models. If the minimum is not meet, the model cannot be computed (default: 5).
`returnData`	Logical. Return original and normalized X and Y matrices (default: TRUE).
`verbose`	Logical. If verbose = TRUE, extra messages could be displayed (default: FALSE).

Details

The function begins by checking the input parameters for consistency and ensuring that the response variable Y has the required columns "time" and "event". It then preprocesses the data by centering and scaling (if specified), and removing variables with zero or near-zero variance. The function also checks for multicollinearity in the data and addresses it if detected.

The core of the function involves determining the optimal number of variables to retain in the model. If the vector parameter is not provided, the function employs a strategy to identify the best number of variables for each latent component. This is achieved by evaluating different combinations of variables and selecting the one that maximizes the model's performance, as determined by the specified evaluation metric (EVAL_METHOD).

Once the optimal number of variables is determined, the function proceeds to compute the sPLSDA-COX model. It employs the mixOmics::splsda function to compute the sPLSDA model, which is then integrated with the Cox proportional hazards model. The resulting model provides insights into the relationship between the predictor variables and survival outcomes.

The function also offers the flexibility to refine the model further by removing non-significant variables based on a specified alpha threshold.

Value

Instance of class "Coxmos" and model "sPLS-DACOX-Dynamic". The class contains the following elements: X: List of normalized X data information.

(data): normalized X matrix
(weightings): sPLS weights
(W.star): sPLS W* vector
(loadings): sPLS loadings
(scores): sPLS scores/variates
(x.mean): mean values for X matrix
(x.sd): standard deviation for X matrix

Y: List of normalized Y data information.

(data): normalized X matrix
(y.mean): mean values for Y matrix
(y.sd): standard deviation for Y matrix'

survival_model: List of survival model information.

fit: coxph object.
AIC: AIC of cox model.
BIC: BIC of cox model.
lp: linear predictors for train data.
coef: Coefficients for cox model.
YChapeau: Y Chapeau residuals.
Yresidus: Y residuals.

n.comp: Number of components selected.

n.varX: Number of Variables selected in each PLS component.

var_by_component: Variables selected in each PLS component.

plot_accuracyPerVariable: If NULL vector is selected, return a plot for understanding the number of variable selection.

call: call function

X_input: X input matrix

Y_input: Y input matrix

alpha: alpha value selected

nsv: Variables removed by cox alpha cutoff.

nzv: Variables removed by remove_near_zero_variance or remove_zero_variance.

nz_coeffvar: Variables removed by coefficient variation near zero.

class: Model class.

time: time consumed for running the cox analysis.

Author(s)

Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es

References

Rohart F, Gautier B, Singh A, Cao KAL (2017). “mixOmics: An R package for ‘omics feature selection and multiple data integration.” PLoS Computational Biology, 13(11). ISSN 15537358, https://pubmed.ncbi.nlm.nih.gov/29099853/.

Examples

data("X_proteomic")
data("Y_proteomic")
X <- X_proteomic[,1:20]
Y <- Y_proteomic
splsdacox_dynamic(X, Y, n.comp = 2, vector = NULL, x.center = TRUE, x.scale = TRUE)

[Package Coxmos version 1.0.2 Index]