R: Simulate a dataset for demonstrating the performance of...

simulateMVSISNY {VariableScreening}

R Documentation

Simulate a dataset for demonstrating the performance of screenIID with the MV-SIS method with numeric outcome Y

Description

Simulates a dataset that can be used to demonstrate variable screening for ultrahigh-dimensional regression with categorical predictors and numerical outcome variable using the MV-SIS-NY option in screenIID. The simulated dataset has p numerical predictors X and a categorical response Y. The X covariates are generated as binary with success probability 0.5 each. The response Y is generated as Y = 5*X1 + 5*X2 + 5*X12 + 5*X22 + e if heteroskedastic=FALSE, where e is a standard normal error term and 1 is a zero-one indicator function for the truth of the statement contained. Special thanks are due to Wei Zhong for providing some of the code upon which this function is based.

Usage

simulateMVSISNY(n = 500, p = 1000)

Arguments

`n`	Number of subjects in the dataset to be simulated. It will also equal to the number of rows in the dataset to be simulated, because it is assumed that each row represents a different independent and identically distributed subject.
`p`	Number of predictor variables (covariates) in the simulated dataset. These covariates will be the features screened by DC-SIS.

Value

A list with following components: X Matrix of predictors to be screened. It will have n rows and p columns. Y Vector of responses. It will have length n.

References

Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110: 630-641. <DOI:10.1080/01621459.2014.920256>

Examples

set.seed(12345678)
results <- simulateMVSISNY()

[Package VariableScreening version 0.2.1 Index]