Simdata {MFSIS}R Documentation

Generate simulation data (The unified class framework to generate simulation data)

Description

This function helps you quickly generate simulation data. You just need to input the sample and dimension of the data you want to generate and the covariance parameter pho. The models is numerous.

Usage

Simdata(
  n,
  p,
  rho,
  beta = c(rep(1, 5), rep(0, p - 5)),
  error = c("gaussian", "t", "cauchy"),
  R = 3,
  style = c("balanced", "unbalanced"),
  lambda = 0.1,
  order = 2,
  type = c("a", "b"),
  model = c("linear", "nonlinear", "binomial", "poisson", "classification", "Cox",
    "interaction", "group", "multivariate", "AFT")
)

Arguments

n

Number of subjects in the dataset to be simulated. It will also equal to the number of rows in the dataset to be simulated, because it is assumed that each row represents a different independent and identically distributed subject.

p

Number of predictor variables (covariates) in the simulated dataset. These covariates will be the features screened by model-free procedures.

rho

The correlation between adjacent covariates in the simulated matrix X. The within-subject covariance matrix of X is assumed to has the same form as an AR(1) auto-regressive covariance matrix, although this is not meant to imply that the X covariates for each subject are in fact a time series. Instead, it is just used as an example of a parsimonious but nontrivial covariance structure. If rho is left at the default of zero, the X covariates will be independent and the simulation will run faster.

beta

A vector with length of n, which are the coefficients that you want to generate about chosen model. The default is beta=(1,1,1,1,1,0,...,0)^T.

error

The distribution of error term.

R

A positive integer, number of outcome categories for multinomial (categorical) outcome Y.

style

Whether categories in categorial data are balanced or not.

lambda

This parameter control the censoring rate in survival data. The censored time is generated by exponential distribution with mean 1/lambda. The default is lambda=0.1.

order

The number of interactive variables and the default is 2.

type

The type of multivariate response models, which use different mean and covariance structure to generate data. Specially, type="a" is following the Model 3.a and type="b" is following the Model 3.b by Liu et al.(2020).

model

The model that you choose to generate simulation data.

Value

the list of your simulation data

Author(s)

Xuewei Cheng xwcheng@csu.edu.cn

References

Liu, W., Y. Ke, J. Liu, and R. Li (2020). Model-free feature screening and FDR control with knockoff features. Journal of the American Statistical Association, 1–16.

Examples

n=100;
p=200;
rho=0.5;
data=Simdata(n,p,rho,error="gaussian",model="linear")

[Package MFSIS version 0.2.0 Index]