Simdata {MFSIS} | R Documentation |
Generate simulation data (The unified class framework to generate simulation data)
Description
This function helps you quickly generate simulation data. You just need to input the sample and dimension of the data you want to generate and the covariance parameter rho. The models is numerous.
Usage
Simdata(
n,
p,
rho,
beta = c(rep(1, 5), rep(0, p - 5)),
error = c("gaussian", "t", "cauchy"),
R = 3,
style = c("balanced", "unbalanced"),
lambda = 0.1,
order = 2,
type = c("a", "b"),
model = c("linear", "nonlinear", "binomial", "poisson", "classification", "Cox",
"interaction", "group", "multivariate", "AFT")
)
Arguments
n |
Number of subjects in the dataset to be simulated. It will also equal to the number of rows in the dataset to be simulated, because it is assumed that each row represents a different independent and identically distributed subject. |
p |
Number of predictor variables (covariates) in the simulated dataset. These covariates will be the features screened by model-free procedures. |
rho |
The correlation between adjacent covariates in the simulated matrix X. The within-subject covariance matrix of X is assumed to has the same form as an AR(1) auto-regressive covariance matrix, although this is not meant to imply that the X covariates for each subject are in fact a time series. Instead, it is just used as an example of a parsimonious but nontrivial covariance structure. If rho is left at the default of zero, the X covariates will be independent and the simulation will run faster. |
beta |
A vector with length of n, which are the coefficients that you want to generate about chosen model. The default is beta=(1,1,1,1,1,0,...,0)^T. |
error |
The distribution of error term. |
R |
A positive integer, number of outcome categories for multinomial (categorical) outcome Y. |
style |
Whether categories in categorial data are balanced or not. |
lambda |
This parameter control the censoring rate in survival data. The censored time is generated by exponential distribution with mean 1/lambda. The default is lambda=0.1. |
order |
The number of interactive variables and the default is 2. |
type |
The type of multivariate response models, which use different mean and covariance structure to generate data. Specially, type="a" is following the Model 3.a and type="b" is following the Model 3.b by Liu et al.(2020). |
model |
The model that you choose to generate simulation data. |
Value
the list of your simulation data
Author(s)
Xuewei Cheng xwcheng@hunnu.edu.cn
References
Liu, W., Y. Ke, J. Liu, and R. Li (2020). Model-free feature screening and FDR control with knockoff features. Journal of the American Statistical Association, 1–16.
Examples
n <- 100
p <- 200
rho <- 0.5
data <- Simdata(n, p, rho, error = "gaussian", model = "linear")