mb.splsdacox {Coxmos} | R Documentation |
MB.sPLS-DACOX
Description
The MB.sPLS-DACOX function conducts a multi-block sparse partial least squares discriminant analysis Cox (MB.sPLS-DACOX) using a dynamic variable selection approach. This analysis is particularly suited for high-dimensional datasets where the goal is to identify the relationship between explanatory variables and survival outcomes. The function outputs a model of class "Coxmos" with an attribute labeled "MB.sPLS-DACOX".
Usage
mb.splsdacox(
X,
Y,
n.comp = 4,
vector = NULL,
MIN_NVAR = 10,
MAX_NVAR = 10000,
n.cut_points = 5,
EVAL_METHOD = "AUC",
x.center = TRUE,
x.scale = FALSE,
remove_near_zero_variance = TRUE,
remove_zero_variance = TRUE,
toKeep.zv = NULL,
remove_non_significant = TRUE,
alpha = 0.05,
MIN_AUC_INCREASE = 0.01,
pred.method = "cenROC",
max.iter = 200,
times = NULL,
max_time_points = 15,
MIN_EPV = 5,
returnData = TRUE,
verbose = FALSE
)
Arguments
X |
Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables. |
Y |
Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations. |
n.comp |
Numeric. Number of latent components to compute for the (s)PLS model (default: 10). |
vector |
Numeric vector or list. Used for computing best number of variables. As many values as components have to be provided. If vector = NULL, an automatic detection is perform (default: NULL). If vector is a list, must be named as the names of X param followed by the number of variables to select. |
MIN_NVAR |
Numeric. Minimum range size for computing cut points to select the best number of variables to use (default: 10). |
MAX_NVAR |
Numeric. Maximum range size for computing cut points to select the best number of variables to use (default: 1000). |
n.cut_points |
Numeric. Number of cut points for searching the optimal number of variables. If only two cut points are selected, minimum and maximum size are used. For MB approaches as many as n.cut_points^n.blocks models will be computed as minimum (default: 5). |
EVAL_METHOD |
Character. If EVAL_METHOD = "AUC", AUC metric will be use to compute the best number of variables. In other case, c-index metric will be used (default: "AUC"). |
x.center |
Logical. If x.center = TRUE, X matrix is centered to zero means (default: TRUE). |
x.scale |
Logical. If x.scale = TRUE, X matrix is scaled to unit variances (default: FALSE). |
remove_near_zero_variance |
Logical. If remove_near_zero_variance = TRUE, near zero variance variables will be removed (default: TRUE). |
remove_zero_variance |
Logical. If remove_zero_variance = TRUE, zero variance variables will be removed (default: TRUE). |
toKeep.zv |
Character vector. Name of variables in X to not be deleted by (near) zero variance filtering (default: NULL). |
remove_non_significant |
Logical. If remove_non_significant = TRUE, non-significant variables/components in final cox model will be removed until all variables are significant by forward selection (default: FALSE). |
alpha |
Numeric. Numerical values are regarded as significant if they fall below the threshold (default: 0.05). |
MIN_AUC_INCREASE |
Numeric. Minimum improvement between different cross validation models to continue evaluating higher values in the multiple tested parameters. If it is not reached for next 'MIN_COMP_TO_CHECK' models and the minimum 'MIN_AUC' value is reached, the evaluation stops (default: 0.01). |
pred.method |
Character. AUC evaluation algorithm method for evaluate the model performance. Must be one of the following: "risksetROC", "survivalROC", "cenROC", "nsROC", "smoothROCtime_C", "smoothROCtime_I" (default: "cenROC"). |
max.iter |
Numeric. Maximum number of iterations for PLS convergence (default: 200). |
times |
Numeric vector. Time points where the AUC will be evaluated. If NULL, a maximum of 'max_time_points' points will be selected equally distributed (default: NULL). |
max_time_points |
Numeric. Maximum number of time points to use for evaluating the model (default: 15). |
MIN_EPV |
Numeric. Minimum number of Events Per Variable (EPV) you want reach for the final cox model. Used to restrict the number of variables/components can be computed in final cox models. If the minimum is not meet, the model cannot be computed (default: 5). |
returnData |
Logical. Return original and normalized X and Y matrices (default: TRUE). |
verbose |
Logical. If verbose = TRUE, extra messages could be displayed (default: FALSE). |
Details
The MB.sPLS-DACOX methodology is designed to handle multi-block datasets, where each block represents a set of related variables. By employing a sparse partial least squares approach, the function efficiently selects relevant variables from each block, ensuring that the final model is both interpretable and predictive. The Cox proportional hazards model is then applied to the selected variables to assess their association with survival outcomes.
The function offers flexibility in terms of parameter tuning. For instance, users can specify the number of latent components to compute, the range of variables to consider for optimal selection, and the evaluation metric (either AUC or c-index). Additionally, data preprocessing options are available, such as centering and scaling of the explanatory variables, and removal of variables with near-zero or zero variance.
Value
Instance of class "Coxmos" and model "MB.sPLS-DACOX". The class contains the following
elements:
X
: List of normalized X data information.
-
(data)
: normalized X matrix -
(weightings)
: PLS weights -
(weightings_norm)
: PLS normalize weights -
(W.star)
: PLS W* vector -
(scores)
: PLS scores/variates -
(E)
: error matrices -
(x.mean)
: mean values for X matrix -
(x.sd)
: standard deviation for X matrix
Y
: List of normalized Y data information.
-
(deviance_residuals)
: deviance residual vector used as Y matrix in the sPLS. -
(dr.mean)
: mean values for deviance residuals Y matrix -
(dr.sd)
: standard deviation for deviance residuals Y matrix' -
(data)
: normalized X matrix -
(y.mean)
: mean values for Y matrix -
(y.sd)
: standard deviation for Y matrix'
survival_model
: List of survival model information.
-
fit
: coxph object. -
AIC
: AIC of cox model. -
BIC
: BIC of cox model. -
lp
: linear predictors for train data. -
coef
: Coefficients for cox model. -
YChapeau
: Y Chapeau residuals. -
Yresidus
: Y residuals.
mb.model
: List of sPLS models computed for each block.
n.comp
: Number of components selected.
n.varX
: Number of variables selected for each block.
call
: call function
X_input
: X input matrix
Y_input
: Y input matrix
B.hat
: PLS beta matrix
R2
: PLS R2
SCR
: PLS SCR
SCT
: PLS SCT
nzv
: Variables removed by remove_near_zero_variance or remove_zero_variance.
nz_coeffvar
: Variables removed by coefficient variation near zero.
time
: time consumed for running the cox analysis.
nzv
: Variables removed by remove_near_zero_variance or remove_zero_variance.
nz_coeffvar
: Variables removed by coefficient variation near zero.
time
: time consumed for running the cox analysis.
Author(s)
Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es
References
Rohart F, Gautier B, Singh A, Cao KAL (2017). “mixOmics: An R package for ‘omics feature selection and multiple data integration.” PLoS Computational Biology, 13(11). ISSN 15537358, https://pubmed.ncbi.nlm.nih.gov/29099853/.
Examples
data("X_multiomic")
data("Y_multiomic")
X <- X_multiomic
X$mirna <- X$mirna[,1:50]
X$proteomic <- X$proteomic[,1:50]
Y <- Y_multiomic
mb.splsdacox(X, Y, n.comp = 2, vector = NULL, x.center = TRUE, x.scale = TRUE)