CSIS {MFSIS}R Documentation

Model-Free Feature screening Based on Concordance Index Statistic

Description

A model-free and data-adaptive feature screening method for ultrahigh-dimensional data and even survival data. The proposed method is based on the concordance index which measures concordance between random vectors even if one of the vectors is a survival object Surv. This rank correlation based method does not require specifying a regression model, and applies robustly to data in the presence of censoring and heavy tails. It enjoys both sure screening and rank consistency properties under weak assumptions.

Usage

CSIS(X, Y, nsis = (dim(X)[1])/log(dim(X)[1]))

Arguments

X

The design matrix of dimensions n * p. Each row is an observation vector.

Y

The response vector of dimension n * 1. For survival models, Y should be an object of class Surv, as provided by the function Surv() in the package survival.

nsis

Number of predictors recruited by CSIS. The default is n/log(n).

Value

the labels of first nsis largest active set of all predictors

Author(s)

Xuewei Cheng xwcheng@csu.edu.cn

Examples


##Scenario 1  generate complete data
n=100;
p=200;
rho=0.5;
data=GendataLM(n,p,rho,error="gaussian")
data=cbind(data[[1]],data[[2]])
colnames(data)[1:ncol(data)]=c(paste0("X",1:(ncol(data)-1)),"Y")
data=as.matrix(data)
X=data[,1:(ncol(data)-1)];
Y=data[,ncol(data)];
A1=CSIS(X,Y,n/log(n));A1

##Scenario 2  generate survival data
library(survival)
n=100;
p=200;
rho=0.5;
data=GendataCox(n,p,rho)
data=cbind(data[[1]],data[[2]],data[[3]])
colnames(data)[ncol(data)]=c("status");
colnames(data)[(ncol(data)-1)]=c("time");
colnames(data)[(1:(ncol(data)-2))]=c(paste0("X",1:(ncol(data)-2)))
data=as.matrix(data)
X=data[,1:(ncol(data)-2)];
Y=Surv(data[,(ncol(data)-1)],data[,ncol(data)]);
A2=CSIS(X,Y,n/log(n));A2


[Package MFSIS version 0.2.0 Index]