gen.data {bestridge} | R Documentation |
Generate simulated data
Description
Generate data for simulations under the generalized linear model and Cox model.
Usage
gen.data(
n,
p,
k = NULL,
rho = 0,
family = c("gaussian", "binomial", "poisson", "cox"),
beta = NULL,
cortype = 1,
snr = 10,
censoring = TRUE,
c = 1,
scal,
sigma = 1,
seed = 1
)
Arguments
n |
The number of observations. |
p |
The number of predictors of interest. |
k |
The number of nonzero coefficients in the underlying regression
model. Can be omitted if |
rho |
A parameter used to characterize the pairwise correlation in
predictors. Default is |
family |
The distribution of the simulated data. |
beta |
The coefficient values in the underlying regression model. |
cortype |
The correlation structure. |
snr |
A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as
as the variance of |
censoring |
Whether data is censored or not. Valid only for |
c |
The censoring rate. Default is |
scal |
A parameter in generating survival time based on the Weibull distribution. Only used for the " |
sigma |
A parameter used to control the signal-to-noise ratio. For linear regression,
it is the error variance |
seed |
seed to be used in generating the random numbers. |
Details
We generate an random Gaussian matrix
with mean 0 and a covariance matrix with an exponential structure
or a constant structure. For the exponential structure, the covariance matrix
has
entry equals
. For the constant structure,
the
entry of the covariance matrix is
for every
and 1 elsewhere. For the moving average structure, For the design matrix
,
we first generate an
random Gaussian matrix
whose entries are i.i.d.
and then normalize its columns
to the
length. Then the design matrix
is generated with
for
.
For family = "gaussian"
, the data model is
The underlying regression coefficient has uniform distribution [m, 100m],
For family= "binomial"
, the data model is
The underlying regression coefficient has uniform distribution [2m, 10m],
For family = "poisson"
, the data is modeled to have an exponential distribution:
For family = "cox"
, the data model is
The centering time is generated from uniform distribution ,
then we define the censor status as
.
The underlying regression coefficient
has uniform distribution [2m, 10m],
In the above models,
where
is determined by the
snr
.
Value
x |
Design matrix of predictors. |
y |
Response variable. |
Tbeta |
The coefficients used in the underlying regression model. |
Author(s)
Liyuan Hu, Kangkang Jiang, Yanhang Zhang, Jin Zhu, Canhong Wen and Xueqin Wang.
See Also
Examples
# Generate simulated data
n <- 200
p <- 20
k <- 5
rho <- 0.4
SNR <- 10
cortype <- 1
seed <- 10
Data <- gen.data(n, p, k, rho, family = "gaussian", cortype = cortype, snr = SNR, seed = seed)
x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
lm.bsrr <- bsrr(x, y, method = "pgsection")