gen.data {bestridge} | R Documentation |
Generate simulated data
Description
Generate data for simulations under the generalized linear model and Cox model.
Usage
gen.data(
n,
p,
k = NULL,
rho = 0,
family = c("gaussian", "binomial", "poisson", "cox"),
beta = NULL,
cortype = 1,
snr = 10,
censoring = TRUE,
c = 1,
scal,
sigma = 1,
seed = 1
)
Arguments
n |
The number of observations. |
p |
The number of predictors of interest. |
k |
The number of nonzero coefficients in the underlying regression
model. Can be omitted if |
rho |
A parameter used to characterize the pairwise correlation in
predictors. Default is |
family |
The distribution of the simulated data. |
beta |
The coefficient values in the underlying regression model. |
cortype |
The correlation structure. |
snr |
A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as
as the variance of |
censoring |
Whether data is censored or not. Valid only for |
c |
The censoring rate. Default is |
scal |
A parameter in generating survival time based on the Weibull distribution. Only used for the " |
sigma |
A parameter used to control the signal-to-noise ratio. For linear regression,
it is the error variance |
seed |
seed to be used in generating the random numbers. |
Details
We generate an n \times p
random Gaussian matrix
X
with mean 0 and a covariance matrix with an exponential structure
or a constant structure. For the exponential structure, the covariance matrix
has (i,j)
entry equals rho^{|i-j|}
. For the constant structure,
the (i,j)
entry of the covariance matrix is rho
for every i
\neq j
and 1 elsewhere. For the moving average structure, For the design matrix X
,
we first generate an n \times p
random Gaussian matrix \bar{X}
whose entries are i.i.d. \sim N(0,1)
and then normalize its columns
to the \sqrt n
length. Then the design matrix X
is generated with
X_j = \bar{X}_j + \rho(\bar{X}_{j+1}+\bar{X}_{j-1})
for j=2,\dots,p-1
.
For family = "gaussian"
, the data model is
Y = X \beta +
\epsilon.
The underlying regression coefficient \beta
has uniform distribution [m, 100m], m=5 \sqrt{2log(p)/n}.
For family= "binomial"
, the data model is
Prob(Y = 1) = \exp(X
\beta + \epsilon)/(1 + \exp(X \beta + \epsilon)).
The underlying regression coefficient \beta
has uniform distribution [2m, 10m], m = 5\sigma \sqrt{2log(p)/n}.
For family = "poisson"
, the data is modeled to have an exponential distribution:
Y = Exp(\exp(X \beta +
\epsilon)).
For family = "cox"
, the data model is
T = (-\log(S(t))/\exp(X \beta))^{1/scal}.
The centering time is generated from uniform distribution [0, c]
,
then we define the censor status as \delta = I\{T \leq C\}, R = min\{T, C\}
.
The underlying regression coefficient \beta
has uniform distribution [2m, 10m], m = 5\sigma \sqrt{2log(p)/n}.
In the above models, \epsilon \sim N(0,
\sigma^2 ),
where \sigma^2
is determined by the snr
.
Value
x |
Design matrix of predictors. |
y |
Response variable. |
Tbeta |
The coefficients used in the underlying regression model. |
Author(s)
Liyuan Hu, Kangkang Jiang, Yanhang Zhang, Jin Zhu, Canhong Wen and Xueqin Wang.
See Also
Examples
# Generate simulated data
n <- 200
p <- 20
k <- 5
rho <- 0.4
SNR <- 10
cortype <- 1
seed <- 10
Data <- gen.data(n, p, k, rho, family = "gaussian", cortype = cortype, snr = SNR, seed = seed)
x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
lm.bsrr <- bsrr(x, y, method = "pgsection")