cox {Coxmos}R Documentation

cox

Description

The cox function conducts a Cox proportional hazards regression analysis, a type of survival analysis. It is designed to handle right-censored data and is built upon the coxph function from the survival package. The function returns an object of class "Coxmos" with the attribute model labeled as "cox".

Usage

cox(
  X,
  Y,
  x.center = TRUE,
  x.scale = FALSE,
  remove_near_zero_variance = TRUE,
  remove_zero_variance = FALSE,
  toKeep.zv = NULL,
  remove_non_significant = FALSE,
  alpha = 0.05,
  MIN_EPV = 5,
  FORCE = FALSE,
  returnData = TRUE,
  verbose = FALSE
)

Arguments

X

Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables.

Y

Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations.

x.center

Logical. If x.center = TRUE, X matrix is centered to zero means (default: TRUE).

x.scale

Logical. If x.scale = TRUE, X matrix is scaled to unit variances (default: FALSE).

remove_near_zero_variance

Logical. If remove_near_zero_variance = TRUE, near zero variance variables will be removed (default: TRUE).

remove_zero_variance

Logical. If remove_zero_variance = TRUE, zero variance variables will be removed (default: TRUE).

toKeep.zv

Character vector. Name of variables in X to not be deleted by (near) zero variance filtering (default: NULL).

remove_non_significant

Logical. If remove_non_significant = TRUE, non-significant variables/components in final cox model will be removed until all variables are significant by forward selection (default: FALSE).

alpha

Numeric. Numerical values are regarded as significant if they fall below the threshold (default: 0.05).

MIN_EPV

Numeric. Minimum number of Events Per Variable (EPV) you want reach for the final cox model. Used to restrict the number of variables/components can be computed in final cox models. If the minimum is not meet, the model cannot be computed (default: 5).

FORCE

Logical. In case the MIN_EPV is not meet, it allows to compute the model (default: FALSE).

returnData

Logical. Return original and normalized X and Y matrices (default: TRUE).

verbose

Logical. If verbose = TRUE, extra messages could be displayed (default: FALSE).

Details

The Cox proportional hazards regression model is a linear model that describes the relationship between the hazard rate and one or more predictor variables. The function provided here offers several preprocessing steps to ensure the quality and robustness of the model.

The function allows for the centering and scaling of predictor variables, which can be essential for the stability and interpretability of the model. It also provides options to remove variables with near-zero or zero variance, which can be problematic in regression analyses. Such variables offer little to no information and can lead to overfitting.

Another notable feature is the ability to remove non-significant predictors from the final model through a backward selection process. This ensures that only variables that contribute significantly to the model are retained.

The function also checks for the minimum number of events per variable (EPV) to ensure the robustness of the model. If the specified EPV is not met, the function can either halt the computation or proceed based on user preference.

It's important to note that while this function is tailored for standard Cox regression, it might not be suitable for high-dimensional data. In such cases, users are advised to consider alternative methods like coxEN() or PLS-based Cox methods.

Value

Instance of class "Coxmos" and model "cox". The class contains the following elements:

X: List of normalized X data information.

Y: List of normalized Y data information.

survival_model: List of survival model information

call: call function

X_input: X input matrix

Y_input: Y input matrix

nsv: Variables removed by remove_non_significant if any.

nzv: Variables removed by remove_near_zero_variance or remove_zero_variance.

nz_coeffvar: Variables removed by coefficient variation near zero.

removed_variables_correlation: Variables removed by being high correlated with other variables.

class: Model class.

time: time consumed for running the cox analysis.

Author(s)

Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es

References

Cox D (1972). “Regression models and life tables (with discussion.” Royal Statistical Society. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x. Concato J, Peduzzi P, Holford TR, Feinstein AR (1995). “Importance of events per independent variable in proportional hazards analysis I. Background, goals, and general strategy.” Journal of Clinical Epidemiology. doi:10.1016/0895-4356(95)00510-2, https://pubmed.ncbi.nlm.nih.gov/8543963/. Therneau TM (2024). A Package for Survival Analysis in R. R package version 3.5-8, https://CRAN.R-project.org/package=survival.

Examples

data("X_proteomic")
data("Y_proteomic")
X <- X_proteomic[,1:10]
Y <- Y_proteomic
cox(X, Y, x.center = TRUE, x.scale = TRUE)

[Package Coxmos version 1.0.2 Index]