sam.gen.ncpen {ncpen}R Documentation

sam.gen.ncpen: generate a simulated dataset.

Description

Generate a synthetic dataset based on the correlation structure from generalized linear models.

Usage

sam.gen.ncpen(n = 100, p = 50, q = 10, k = 3, r = 0.3,
  cf.min = 0.5, cf.max = 1, corr = 0.5, seed = NULL,
  family = c("gaussian", "binomial", "multinomial", "cox", "poisson"))

Arguments

n

(numeric) the number of samples.

p

(numeric) the number of variables.

q

(numeric) the number of nonzero coefficients.

k

(numeric) the number of classes for multinomial.

r

(numeric) the ratio of censoring for cox.

cf.min

(numeric) value of the minimum coefficient.

cf.max

(numeric) value of the maximum coefficient.

corr

(numeric) strength of correlations in the correlation structure.

seed

(numeric) seed number for random generation. Default does not use seed.

family

(character) model type.

Details

A design matrix for regression models is generated from the multivariate normal distribution with a correlation structure. Then the response variables are computed with a specific model based on the true coefficients (see references). Note the censoring indicator locates at the last column of x.mat for cox.

Value

An object with list class containing

x.mat

design matrix.

y.vec

responses.

b.vec

true coefficients.

Author(s)

Dongshin Kim, Sunghoon Kwon, Sangin Lee

References

Kwon, S., Lee, S. and Kim, Y. (2016). Moderately clipped LASSO. Computational Statistics and Data Analysis, 92C, 53-67. Kwon, S. and Kim, Y. (2012). Large sample properties of the SCAD-penalized maximum likelihood estimation on high dimensions. Statistica Sinica, 629-653.

See Also

ncpen

Examples

### linear regression
sam =  sam.gen.ncpen(n=200,p=20,q=5,cf.min=0.5,cf.max=1,corr=0.5)
x.mat = sam$x.mat; y.vec = sam$y.vec
head(x.mat); head(y.vec)

[Package ncpen version 1.0.0 Index]