cox {Coxmos} | R Documentation |
cox
Description
The cox
function conducts a Cox proportional hazards regression analysis, a type of survival
analysis. It is designed to handle right-censored data and is built upon the coxph
function from
the survival
package. The function returns an object of class "Coxmos" with the attribute model
labeled as "cox".
Usage
cox(
X,
Y,
x.center = TRUE,
x.scale = FALSE,
remove_near_zero_variance = TRUE,
remove_zero_variance = FALSE,
toKeep.zv = NULL,
remove_non_significant = FALSE,
alpha = 0.05,
MIN_EPV = 5,
FORCE = FALSE,
returnData = TRUE,
verbose = FALSE
)
Arguments
X |
Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables. |
Y |
Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations. |
x.center |
Logical. If x.center = TRUE, X matrix is centered to zero means (default: TRUE). |
x.scale |
Logical. If x.scale = TRUE, X matrix is scaled to unit variances (default: FALSE). |
remove_near_zero_variance |
Logical. If remove_near_zero_variance = TRUE, near zero variance variables will be removed (default: TRUE). |
remove_zero_variance |
Logical. If remove_zero_variance = TRUE, zero variance variables will be removed (default: TRUE). |
toKeep.zv |
Character vector. Name of variables in X to not be deleted by (near) zero variance filtering (default: NULL). |
remove_non_significant |
Logical. If remove_non_significant = TRUE, non-significant variables/components in final cox model will be removed until all variables are significant by forward selection (default: FALSE). |
alpha |
Numeric. Numerical values are regarded as significant if they fall below the threshold (default: 0.05). |
MIN_EPV |
Numeric. Minimum number of Events Per Variable (EPV) you want reach for the final cox model. Used to restrict the number of variables/components can be computed in final cox models. If the minimum is not meet, the model cannot be computed (default: 5). |
FORCE |
Logical. In case the MIN_EPV is not meet, it allows to compute the model (default: FALSE). |
returnData |
Logical. Return original and normalized X and Y matrices (default: TRUE). |
verbose |
Logical. If verbose = TRUE, extra messages could be displayed (default: FALSE). |
Details
The Cox proportional hazards regression model is a linear model that describes the relationship between the hazard rate and one or more predictor variables. The function provided here offers several preprocessing steps to ensure the quality and robustness of the model.
The function allows for the centering and scaling of predictor variables, which can be essential for the stability and interpretability of the model. It also provides options to remove variables with near-zero or zero variance, which can be problematic in regression analyses. Such variables offer little to no information and can lead to overfitting.
Another notable feature is the ability to remove non-significant predictors from the final model through a backward selection process. This ensures that only variables that contribute significantly to the model are retained.
The function also checks for the minimum number of events per variable (EPV) to ensure the robustness of the model. If the specified EPV is not met, the function can either halt the computation or proceed based on user preference.
It's important to note that while this function is tailored for standard Cox regression, it might
not be suitable for high-dimensional data. In such cases, users are advised to consider alternative
methods like coxEN()
or PLS-based Cox methods.
Value
Instance of class "Coxmos" and model "cox". The class contains the following elements:
X
: List of normalized X data information.
-
(data)
: normalized X matrix -
(x.mean)
: mean values for X matrix -
(x.sd)
: standard deviation for X matrix
Y
: List of normalized Y data information.
-
(data)
: normalized Y matrix -
(y.mean)
: mean values for Y matrix -
(y.sd)
: standard deviation for Y matrix
survival_model
: List of survival model information
-
fit
: coxph object. -
AIC
: AIC of cox model. -
BIC
: BIC of cox model. -
lp
: linear predictors for train data. -
coef
: Coefficients for cox model. -
YChapeau
: Y Chapeau residuals. -
Yresidus
: Y residuals.
call
: call function
X_input
: X input matrix
Y_input
: Y input matrix
nsv
: Variables removed by remove_non_significant if any.
nzv
: Variables removed by remove_near_zero_variance or remove_zero_variance.
nz_coeffvar
: Variables removed by coefficient variation near zero.
removed_variables_correlation
: Variables removed by being high correlated with other
variables.
class
: Model class.
time
: time consumed for running the cox analysis.
Author(s)
Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es
References
Cox D (1972). “Regression models and life tables (with discussion.” Royal Statistical Society. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x. Concato J, Peduzzi P, Holford TR, Feinstein AR (1995). “Importance of events per independent variable in proportional hazards analysis I. Background, goals, and general strategy.” Journal of Clinical Epidemiology. doi:10.1016/0895-4356(95)00510-2, https://pubmed.ncbi.nlm.nih.gov/8543963/. Therneau TM (2024). A Package for Survival Analysis in R. R package version 3.5-8, https://CRAN.R-project.org/package=survival.
Examples
data("X_proteomic")
data("Y_proteomic")
X <- X_proteomic[,1:10]
Y <- Y_proteomic
cox(X, Y, x.center = TRUE, x.scale = TRUE)