uni.selection {compound.Cox} | R Documentation |
This function performs univariate feature selection using significance tests (Wald tests or score tests) based on association between individual features and survival. Features are selected if their P-values are less than a given threshold (P.value).
uni.selection(t.vec, d.vec, X.mat, P.value=0.001,K=10,score=TRUE,d0=0,
randomize=FALSE,CC.plot=FALSE,permutation=FALSE,M=200)
t.vec |
Vector of survival times (time to either death or censoring) |
d.vec |
Vector of censoring indicators (1=death, 0=censoring) |
X.mat |
n by p matrix of covariates, where n is the sample size and p is the number of covariates |
P.value |
A threshold for selecting features |
K |
The number of cross-validation folds |
score |
If TRUE, the score tests are used; if not, the Wald tests are used |
d0 |
A positive constant to stabilize the variance of score statistics (Witten & Tibshirani 2010) |
randomize |
If TRUE, randomize patient ID's before cross-validation |
CC.plot |
If TRUE, the compound covariate (CC) predictors are plotted |
permutation |
If TRUE, the FDR is computed by a permutation method (Witten & Tibshirani 2010; Emura et al. 2019). |
M |
The number of permutations to calculate the FDR |
The cross-validated likelihood (CVL) value is computed for selected features (Matsui 2006; Emura et al. 2019). A high CVL value corresponds to a better predictive ability of selected features. Hence, the CVL value can be used to find the optimal set of features. The CVL value is computed by a K-fold cross-validation, where the number K can be chosen by user. The false discovery rate (FDR) is also computed by a formula and a permutation test (if "permutation=TRUE"). The RCVL1 and RCVL2 are "re-substitution" CVL values and provide upper control limits for the CVL value. If the CVL value is less than RCVL1 and RCVL2 values, the CVL value would be in-control. On the other hand, if the CVL value exceeds either RCVL1 or RCVL2 value, then the CVL may be computed again after changing the sample allocation.
gene |
Gene symbols |
beta |
Estimated regression coefficients |
Z |
Z-values for significance tests |
P |
P-values for significance tests |
CVL |
The value of CVL, RCVL1, and RCVL2 (Emura et al. 2019) |
Genes |
The number of genes, the number of selected genes, and the number of falsely selected genes |
FDR |
False discovery rate (by a formula or a permutation method) |
Takeshi Emura
Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.
Matsui S (2006). Predicting Survival Outcomes Using Subsets of Significant Genes in Prognostic Marker Studies with Microarrays. BMC Bioinformatics: 7:156.
Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Method Med Res 19:29-51
data(Lung)
t.vec=Lung$t.vec[Lung$train==TRUE]
d.vec=Lung$d.vec[Lung$train==TRUE]
X.mat=Lung[Lung$train==TRUE,-c(1,2,3)]
uni.selection(t.vec, d.vec, X.mat, P.value=0.05,K=5,score=FALSE)
## the outputs reproduce Table 3 of Emura and Chen (2016) ##