gen.data {BeSS} | R Documentation |
Generate simulated data
Description
Generate data for simulations under the generalized linear model and Cox model.
Usage
gen.data(n, p, family, K, rho = 0, sigma = 1, beta = NULL, censoring = TRUE,
c = 1, scal)
Arguments
n |
The number of observations. |
p |
The number of predictors of interest. |
family |
The distribution of the simulated data. " |
K |
The number of nonzero coefficients in the underlying regression model. |
rho |
A parameter used to characterize the pairwise correlation in predictors. Default is 0. |
sigma |
A parameter used to control the signal-to-noise ratio. For linear regression, it is the error variance |
beta |
The coefficient values in the underlying regression model. |
censoring |
Whether data is censored or not. Default is TRUE |
c |
The censoring rate. Default is 1. |
scal |
A parameter in generating survival time based on the Weibull distribution. Only used for the " |
Details
For the design matrix X
, we first generate an n x p random Gaussian matrix \bar{X}
whose entries are i.i.d. \sim N(0,1)
and then normalize its columns to the \sqrt n
length. Then the design matrix X
is generated with X_j = \bar{X}_j + \rho(\bar{X}_{j+1}+\bar{X}_{j-1})
for j=2,\dots,p-1
.
For "gaussian
" family, the data model is
Y = X \beta + \epsilon, where \epsilon \sim N(0, \sigma^2 ).
The underlying regression coefficient \beta
has uniform distribution [m, 100m], m=5 \sqrt{2log(p)/n}.
For "binomial
" family, the data model is
Prob(Y = 1) = exp(X \beta)/(1 + exp(X \beta))
The underlying regression coefficient \beta
has uniform distribution [2m, 10m], m = 5\sigma \sqrt{2log(p)/n}.
For "cox
" family, the data model is
T = (-log(S(t))/exp(X \beta))^(1/scal),
The centerning time C
is generated from uniform distribution [0, c], then we define the censor status as \delta = I{T <= C}, R = min{T, C}
.
The underlying regression coefficient \beta
has uniform distribution [2m, 10m], m = 5\sigma \sqrt{2log(p)/n}.
Value
A list with the following components: x, y, Tbeta.
x |
Design matrix of predictors. |
y |
Response variable |
Tbeta |
The coefficients used in the underlying regression model. |
Author(s)
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
References
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
Examples
# Generate simulated data
n <- 500
p <- 20
K <-10
sigma <- 1
rho <- 0.2
data <- gen.data(n, p, family = "gaussian", K, rho, sigma)
# Best subset selection
fit <- bess(data$x, data$y, family = "gaussian")