generate.data {abess} | R Documentation |
Generate simulated data
Description
Generate simulated data under the generalized linear model and Cox proportional hazard model.
Usage
generate.data(
n,
p,
support.size = NULL,
rho = 0,
family = c("gaussian", "binomial", "poisson", "cox", "mgaussian", "multinomial",
"gamma", "ordinal"),
beta = NULL,
cortype = 1,
snr = 10,
sigma = NULL,
weibull.shape = 1,
uniform.max = 1,
y.dim = 3,
class.num = 3,
seed = 1
)
Arguments
n |
The number of observations. |
p |
The number of predictors of interest. |
support.size |
The number of nonzero coefficients in the underlying regression
model. Can be omitted if |
rho |
A parameter used to characterize the pairwise correlation in
predictors. Default is |
family |
The distribution of the simulated response. |
beta |
The coefficient values in the underlying regression model.
If it is supplied, |
cortype |
The correlation structure.
|
snr |
A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as
as the variance of |
sigma |
The variance of the gaussian noise. Default |
weibull.shape |
The shape parameter of the Weibull distribution.
It works only when |
uniform.max |
A parameter controlling censored rate.
A large value implies a small censored rate;
otherwise, a large censored rate.
It works only when |
y.dim |
Response's Dimension. It works only when |
class.num |
The number of class. It works only when |
seed |
random seed. Default: |
Details
For family = "gaussian"
, the data model is
The underlying regression coefficient has
uniform distribution [m, 100m] and
For family= "binomial"
, the data model is
The underlying regression coefficient has
uniform distribution [2m, 10m] and
For family = "poisson"
, the data is modeled to have
an exponential distribution:
The underlying regression coefficient has
uniform distribution [2m, 10m] and
For family = "gamma"
, the data is modeled to have
a gamma distribution:
where is shape parameter in a gamma distribution.
The underlying regression coefficient
has
uniform distribution [2m, 100m] and
For family = "ordinal"
, the data is modeled to have
an ordinal distribution.
For family = "cox"
, the model for failure time is
where is a uniform random variable with range [0, 1].
The centering time
is generated from
uniform distribution
,
then we define the censor status as
and observed time as
.
The underlying regression coefficient
has
uniform distribution [2m, 10m],
where
.
For family = "mgaussian"
, the data model is
The non-zero values of regression matrix are sampled from
uniform distribution [m, 100m] and
For family= "multinomial"
, the data model is
The non-zero values of regression coefficient has
uniform distribution [2m, 10m] and
In the above models, and
,
where
is determined by the
snr
and q is y.dim
.
Value
A list
object comprising:
x |
Design matrix of predictors. |
y |
Response variable. |
beta |
The coefficients used in the underlying regression model. |
Author(s)
Jin Zhu
Examples
# Generate simulated data
n <- 200
p <- 20
support.size <- 5
dataset <- generate.data(n, p, support.size)
str(dataset)