splsdrcox_dynamic {Coxmos}R Documentation

sPLS-DRCOX Dynamic

Description

The sPLS-DRCOX Dynamic function conducts a sparse partial least squares deviance residual Cox regression analysis using a dynamic variable selection approach. This method is particularly useful for high-dimensional survival data where variable selection is crucial. The function returns a model of class "Coxmos" with the attribute model specified as "sPLS-DRCOX".

Usage

splsdrcox_dynamic(
  X,
  Y,
  n.comp = 4,
  vector = NULL,
  MIN_NVAR = 10,
  MAX_NVAR = 1000,
  n.cut_points = 5,
  MIN_AUC_INCREASE = 0.01,
  x.center = TRUE,
  x.scale = FALSE,
  remove_near_zero_variance = TRUE,
  remove_zero_variance = TRUE,
  toKeep.zv = NULL,
  remove_non_significant = FALSE,
  alpha = 0.05,
  EVAL_METHOD = "AUC",
  pred.method = "cenROC",
  max.iter = 200,
  times = NULL,
  max_time_points = 15,
  MIN_EPV = 5,
  returnData = TRUE,
  verbose = FALSE
)

Arguments

X

Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables.

Y

Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations.

n.comp

Numeric. Number of latent components to compute for the (s)PLS model (default: 10).

vector

Numeric vector. Used for computing best number of variables. As many values as components have to be provided. If vector = NULL, an automatic detection is perform (default: NULL).

MIN_NVAR

Numeric. Minimum range size for computing cut points to select the best number of variables to use (default: 10).

MAX_NVAR

Numeric. Maximum range size for computing cut points to select the best number of variables to use (default: 1000).

n.cut_points

Numeric. Number of cut points for searching the optimal number of variables. If only two cut points are selected, minimum and maximum size are used. For MB approaches as many as n.cut_points^n.blocks models will be computed as minimum (default: 5).

MIN_AUC_INCREASE

Numeric. Minimum improvement between different cross validation models to continue evaluating higher values in the multiple tested parameters. If it is not reached for next 'MIN_COMP_TO_CHECK' models and the minimum 'MIN_AUC' value is reached, the evaluation stops (default: 0.01).

x.center

Logical. If x.center = TRUE, X matrix is centered to zero means (default: TRUE).

x.scale

Logical. If x.scale = TRUE, X matrix is scaled to unit variances (default: FALSE).

remove_near_zero_variance

Logical. If remove_near_zero_variance = TRUE, near zero variance variables will be removed (default: TRUE).

remove_zero_variance

Logical. If remove_zero_variance = TRUE, zero variance variables will be removed (default: TRUE).

toKeep.zv

Character vector. Name of variables in X to not be deleted by (near) zero variance filtering (default: NULL).

remove_non_significant

Logical. If remove_non_significant = TRUE, non-significant variables/components in final cox model will be removed until all variables are significant by forward selection (default: FALSE).

alpha

Numeric. Numerical values are regarded as significant if they fall below the threshold (default: 0.05).

EVAL_METHOD

Character. If EVAL_METHOD = "AUC", AUC metric will be use to compute the best number of variables. In other case, c-index metric will be used (default: "AUC").

pred.method

Character. AUC evaluation algorithm method for evaluate the model performance. Must be one of the following: "risksetROC", "survivalROC", "cenROC", "nsROC", "smoothROCtime_C", "smoothROCtime_I" (default: "cenROC").

max.iter

Numeric. Maximum number of iterations for PLS convergence (default: 200).

times

Numeric vector. Time points where the AUC will be evaluated. If NULL, a maximum of 'max_time_points' points will be selected equally distributed (default: NULL).

max_time_points

Numeric. Maximum number of time points to use for evaluating the model (default: 15).

MIN_EPV

Numeric. Minimum number of Events Per Variable (EPV) you want reach for the final cox model. Used to restrict the number of variables/components can be computed in final cox models. If the minimum is not meet, the model cannot be computed (default: 5).

returnData

Logical. Return original and normalized X and Y matrices (default: TRUE).

verbose

Logical. If verbose = TRUE, extra messages could be displayed (default: FALSE).

Details

The function employs a sparse partial least squares (sPLS) approach combined with deviance residuals from a Cox model to handle survival data. The dynamic variable selection methodology ensures that only the most relevant predictors are included in the model, enhancing interpretability and potentially improving predictive performance.

The input matrices X and Y represent the explanatory and response variables, respectively. It is essential to note that qualitative variables in X should be transformed into binary format. The response matrix Y should have two columns named "time" and "event", where the "event" column can take values 0/1 or FALSE/TRUE, representing censored and event observations.

Several parameters allow users to fine-tune the model. For instance, n.comp determines the number of latent components for the PLS model, and vector aids in computing the optimal number of variables. Parameters like MIN_NVAR and MAX_NVAR define the range for computing cut points to select the best number of variables. The function also provides options for data preprocessing, such as centering and scaling the X matrix and removing variables with near-zero or zero variance.

The evaluation metric for determining the best number of variables can be specified using the EVAL_METHOD parameter. The function supports various evaluation algorithms for assessing model performance, as indicated by the pred.method parameter.

Value

Instance of class "Coxmos" and model "sPLS-DRCOX-Dynamic". The class contains the following elements: X: List of normalized X data information.

Y: List of normalized Y data information.

survival_model: List of survival model information.

n.comp: Number of components selected.

n.varX: Number of Variables selected in each PLS component.

var_by_component: Variables selected in each PLS component.

plot_accuracyPerVariable: If NULL vector is selected, return a plot for understanding the number of variable selection.

call: call function

X_input: X input matrix

Y_input: Y input matrix

beta_matrix: PLS beta matrix

R2: PLS R2

SCR: PLS SCR

SCT: PLS SCT

alpha: alpha value selected

nsv: Variables removed by cox alpha cutoff.

nzv: Variables removed by remove_near_zero_variance or remove_zero_variance.

nz_coeffvar: Variables removed by coefficient variation near zero.

class: Model class.

time: time consumed for running the cox analysis.

Author(s)

Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es

References

Bastien P (2008). “Deviance residuals based PLS regression for censored data in high dimensional setting.” Chemometrics and Intelligent Laboratory Systems. doi:10.1016/j.chemolab.2007.09.009, https://www.sciencedirect.com/science/article/abs/pii/S0169743907001931?via%3Dihub. Bastien P, Bastien P, Bertrand F, Meyer N, Meyer N, Meyer N, Maumy-Bertrand M (2015). “Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data.” Bioinformatics. https://academic.oup.com/bioinformatics/article/31/3/397/2366078. Rohart F, Gautier B, Singh A, Cao KAL (2017). “mixOmics: An R package for ‘omics feature selection and multiple data integration.” PLoS Computational Biology, 13(11). ISSN 15537358, https://pubmed.ncbi.nlm.nih.gov/29099853/.

Examples

data("X_proteomic")
data("Y_proteomic")
X <- X_proteomic[,1:50]
Y <- Y_proteomic
splsdrcox_dynamic(X, Y, n.comp = 3, vector = NULL, x.center = TRUE, x.scale = TRUE)

[Package Coxmos version 1.0.2 Index]