CSIS {MFSIS} | R Documentation |
Model-Free Feature screening Based on Concordance Index Statistic
Description
A model-free and data-adaptive feature screening method for ultrahigh-dimensional data and even survival data. The proposed method is based on the concordance index which measures concordance between random vectors even if one of the vectors is a survival object Surv. This rank correlation based method does not require specifying a regression model, and applies robustly to data in the presence of censoring and heavy tails. It enjoys both sure screening and rank consistency properties under weak assumptions.
Usage
CSIS(X, Y, nsis = (dim(X)[1])/log(dim(X)[1]))
Arguments
X |
The design matrix of dimensions n * p. Each row is an observation vector. |
Y |
The response vector of dimension n * 1. For survival models, Y should be an object of class Surv, as provided by the function Surv() in the package survival. |
nsis |
Number of predictors recruited by CSIS. The default is n/log(n). |
Value
the labels of first nsis largest active set of all predictors
Author(s)
Xuewei Cheng xwcheng@hunnu.edu.cn
References
Cheng X, Li G, Wang H. The concordance filter: an adaptive model-free feature screening procedure[J]. Computational Statistics, 2023: 1-24.
Examples
## Scenario 1 generate complete data
n <- 100
p <- 200
rho <- 0.5
data <- GendataLM(n, p, rho, error = "gaussian")
data <- cbind(data[[1]], data[[2]])
colnames(data)[1:ncol(data)] <- c(paste0("X", 1:(ncol(data) - 1)), "Y")
data <- as.matrix(data)
X <- data[, 1:(ncol(data) - 1)]
Y <- data[, ncol(data)]
A1 <- CSIS(X, Y, n / log(n))
A1
## Scenario 2 generate survival data
library(survival)
n <- 100
p <- 200
rho <- 0.5
data <- GendataCox(n, p, rho)
data <- cbind(data[[1]], data[[2]], data[[3]])
colnames(data)[ncol(data)] <- c("status")
colnames(data)[(ncol(data) - 1)] <- c("time")
colnames(data)[(1:(ncol(data) - 2))] <- c(paste0("X", 1:(ncol(data) - 2)))
data <- as.matrix(data)
X <- data[, 1:(ncol(data) - 2)]
Y <- Surv(data[, (ncol(data) - 1)], data[, ncol(data)])
A2 <- CSIS(X, Y, n / log(n))
A2