simulate_data {lgpr} | R Documentation |
Generate an artificial longitudinal data set
Description
Generate an artificial longitudinal data set.
Usage
simulate_data(
N,
t_data,
covariates = c(),
names = NULL,
relevances = c(1, 1, rep(1, length(covariates))),
n_categs = rep(2, sum(covariates %in% c(2, 3))),
t_jitter = 0,
lengthscales = rep(12, 2 + sum(covariates %in% c(0, 1, 2))),
f_var = 1,
noise_type = "gaussian",
snr = 3,
phi = 1,
gamma = 0.2,
N_affected = round(N/2),
t_effect_range = "auto",
t_observed = "after_0",
c_hat = 0,
dis_fun = "gp_warp_vm",
bin_kernel = FALSE,
steepness = 0.5,
vm_params = c(0.025, 1),
continuous_info = list(mu = c(pi/8, pi, -0.5), lambda = c(pi/8, pi, 1)),
N_trials = 1,
force_zeromean = TRUE
)
Arguments
N |
Number of individuals. |
t_data |
Measurement times (same for each individual, unless
|
covariates |
Integer vector that defines the types of covariates (other than id and age). If not given, only the id and age covariates are created. Different integers correspond to the following covariate types:
|
names |
Covariate names. |
relevances |
Relative relevance of each component. Must have be a vector
so that |
n_categs |
An integer vector defining the number of categories
for each categorical covariate, so that |
t_jitter |
Standard deviation of the jitter added to the given measurement times. |
lengthscales |
A vector so that |
f_var |
variance of f |
noise_type |
Either "gaussian", "poisson", "nb" (negative binomial), "binomial", or "bb" (beta-binomial). |
snr |
The desired signal-to-noise ratio. This argument is valid
only when |
phi |
The inverse overdispersion parameter for negative binomial data.
The variance is |
gamma |
The dispersion parameter for beta-binomial data. |
N_affected |
Number of diseased individuals that are affected by the
disease. This defaults to the number of diseased individuals. This argument
can only be given if |
t_effect_range |
Time interval from which the disease effect times are sampled uniformly. Alternatively, This can any function that returns the (possibly randomly generated) real disease effect time for one individual. |
t_observed |
Determines how the disease effect time is observed. This
can be any function that takes the real disease effect time as an argument
and returns the (possibly randomly generated) observed onset/initiation time.
Alternatively, this can be a string of the form |
c_hat |
a constant added to f |
dis_fun |
A function or a string that defines the disease effect. If
this is a function, that function is used to generate the effect.
If |
bin_kernel |
Should the binary kernel be used for categorical
covariates? If this is |
steepness |
Steepness of the input warping function. This is only used if the disease component is in the model. |
vm_params |
Parameters of the variance mask function. This is only
needed if |
continuous_info |
Info for generating continuous covariates. Must be a
list containing fields
|
N_trials |
The number of trials parameter for binomial data. |
force_zeromean |
Should each component (excluding the disease age component) be forced to have a zero mean? |
Value
An object of class lgpsim.
Examples
# Generate Gaussian data
dat <- simulate_data(N = 4, t_data = c(6, 12, 24, 36, 48), snr = 3)
# Generate negative binomially (NB) distributed count data
dat <- simulate_data(
N = 6, t_data = seq(2, 10, by = 2), noise_type = "nb",
phi = 2
)