BcorSIS {MFSIS} | R Documentation |
A Generic Sure Independence Screening Procedure
Description
A generic nonparametric sure independence screening procedure, called BCor-SIS, on the basis of a recently developed universal dependence measure: Ball correlation. We show that the proposed procedure has strong screening consistency even when the dimensionality is an exponential order of the sample size without imposing sub-exponential moment assumptions on the data.
Usage
BcorSIS(X, Y, nsis = (dim(X)[1])/log(dim(X)[1]))
Arguments
X |
The design matrix of dimensions n * p. Each row is an observation vector. |
Y |
The response vector of dimension n * 1. For survival models, Y should be an object of class Surv, as provided by the function Surv() in the package survival. |
nsis |
Number of predictors recruited by BcorSIS. The default is n/log(n). |
Value
the labels of first nsis largest active set of all predictors
Author(s)
Xuewei Cheng xwcheng@hunnu.edu.cn
References
Pan, W., X. Wang, H. Zhang, H. Zhu, and J. Zhu (2020). Ball covariance: A generic measure of dependence in banach space. Journal of the American Statistical Association 115(529),307–317.
Pan, W., X. Wang, W. Xiao, and H. Zhu (2019). A generic sure independence screening procedure. Journal of the American Statistical Association 114(526), 928–937.
Examples
## Scenario 1 generate complete data
n <- 100
p <- 200
rho <- 0.5
data <- GendataLM(n, p, rho, error = "gaussian")
data <- cbind(data[[1]], data[[2]])
colnames(data)[1:ncol(data)] <- c(paste0("X", 1:(ncol(data) - 1)), "Y")
data <- as.matrix(data)
X <- data[, 1:(ncol(data) - 1)]
Y <- data[, ncol(data)]
A1 <- BcorSIS(X, Y, n / log(n))
A1
## Scenario 2 generate survival data
library(survival)
n <- 100
p <- 200
rho <- 0.5
data <- GendataCox(n, p, rho)
data <- cbind(data[[1]], data[[2]], data[[3]])
colnames(data)[ncol(data)] <- c("status")
colnames(data)[(ncol(data) - 1)] <- c("time")
colnames(data)[(1:(ncol(data) - 2))] <- c(paste0("X", 1:(ncol(data) - 2)))
data <- as.matrix(data)
X <- data[, 1:(ncol(data) - 2)]
Y <- Surv(data[, (ncol(data) - 1)], data[, ncol(data)])
A2 <- BcorSIS(X, Y, n / log(n))
A2