sam.gen.ncpen {ncpen} | R Documentation |
sam.gen.ncpen: generate a simulated dataset.
Description
Generate a synthetic dataset based on the correlation structure from generalized linear models.
Usage
sam.gen.ncpen(n = 100, p = 50, q = 10, k = 3, r = 0.3,
cf.min = 0.5, cf.max = 1, corr = 0.5, seed = NULL,
family = c("gaussian", "binomial", "multinomial", "cox", "poisson"))
Arguments
n |
(numeric) the number of samples. |
p |
(numeric) the number of variables. |
q |
(numeric) the number of nonzero coefficients. |
k |
(numeric) the number of classes for |
r |
(numeric) the ratio of censoring for |
cf.min |
(numeric) value of the minimum coefficient. |
cf.max |
(numeric) value of the maximum coefficient. |
corr |
(numeric) strength of correlations in the correlation structure. |
seed |
(numeric) seed number for random generation. Default does not use seed. |
family |
(character) model type. |
Details
A design matrix for regression models is generated from the multivariate normal distribution with a correlation structure.
Then the response variables are computed with a specific model based on the true coefficients (see references).
Note the censoring indicator locates at the last column of x.mat
for cox
.
Value
An object with list class containing
x.mat |
design matrix. |
y.vec |
responses. |
b.vec |
true coefficients. |
Author(s)
Dongshin Kim, Sunghoon Kwon, Sangin Lee
References
Kwon, S., Lee, S. and Kim, Y. (2016). Moderately clipped LASSO. Computational Statistics and Data Analysis, 92C, 53-67. Kwon, S. and Kim, Y. (2012). Large sample properties of the SCAD-penalized maximum likelihood estimation on high dimensions. Statistica Sinica, 629-653.
See Also
Examples
### linear regression
sam = sam.gen.ncpen(n=200,p=20,q=5,cf.min=0.5,cf.max=1,corr=0.5)
x.mat = sam$x.mat; y.vec = sam$y.vec
head(x.mat); head(y.vec)