uni.selection {compound.Cox} R Documentation

## Univariate feature selection based on univariate significance tests

### Description

This function performs univariate feature selection using significance tests (Wald tests or score tests) based on association between individual features and survival. Features are selected if their P-values are less than a given threshold (P.value).

### Usage

uni.selection(t.vec, d.vec, X.mat, P.value=0.001,K=10,score=TRUE,d0=0,
randomize=FALSE,CC.plot=FALSE,permutation=FALSE,M=200)


### Arguments

 t.vec Vector of survival times (time to either death or censoring) d.vec Vector of censoring indicators (1=death, 0=censoring) X.mat n by p matrix of covariates, where n is the sample size and p is the number of covariates P.value A threshold for selecting features K The number of cross-validation folds score If TRUE, the score tests are used; if not, the Wald tests are used d0 A positive constant to stabilize the variance of score statistics (Witten & Tibshirani 2010) randomize If TRUE, randomize patient ID's before cross-validation CC.plot If TRUE, the compound covariate (CC) predictors are plotted permutation If TRUE, the FDR is computed by a permutation method (Witten & Tibshirani 2010; Emura et al. 2019). M The number of permutations to calculate the FDR

### Details

The cross-validated likelihood (CVL) value is computed for selected features (Matsui 2006; Emura et al. 2019). A high CVL value corresponds to a better predictive ability of selected features. Hence, the CVL value can be used to find the optimal set of features. The CVL value is computed by a K-fold cross-validation, where the number K can be chosen by user. The false discovery rate (FDR) is also computed by a formula and a permutation test (if "permutation=TRUE"). The RCVL1 and RCVL2 are "re-substitution" CVL values and provide upper control limits for the CVL value. If the CVL value is less than RCVL1 and RCVL2 values, the CVL value would be in-control. On the other hand, if the CVL value exceeds either RCVL1 or RCVL2 value, then the CVL may be computed again after changing the sample allocation.

### Value

 gene  Gene symbols beta  Estimated regression coefficients Z  Z-values for significance tests P  P-values for significance tests CVL  The value of CVL, RCVL1, and RCVL2 (Emura et al. 2019) Genes  The number of genes, the number of selected genes, and the number of falsely selected genes FDR  False discovery rate (by a formula or a permutation method)

Takeshi Emura

### References

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

Matsui S (2006). Predicting Survival Outcomes Using Subsets of Significant Genes in Prognostic Marker Studies with Microarrays. BMC Bioinformatics: 7:156.

Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Method Med Res 19:29-51

### Examples

data(Lung)
t.vec=Lung$t.vec[Lung$train==TRUE]
d.vec=Lung$d.vec[Lung$train==TRUE]
X.mat=Lung[Lung\$train==TRUE,-c(1,2,3)]
uni.selection(t.vec, d.vec, X.mat, P.value=0.05,K=5,score=FALSE)
## the outputs reproduce Table 3 of Emura and Chen (2016) ##


[Package compound.Cox version 3.20 Index]