generate.data.for.demonstration {funreg} | R Documentation |
Generate data for some demonstration examples
Description
Simulates a dataset with two functional covariates, four subject-level scalar covariates, and a binary outcome.
Usage
generate.data.for.demonstration(
nsub = 400,
b0.true = -5,
b1.true = 0,
b2.true = +1,
b3.true = -1,
b4.true = +1,
nobs = 500,
observe.rate = 0.1
)
Arguments
nsub |
The number of subjects in the simulated dataset. |
b0.true |
The true value of the intercept. |
b1.true |
The true value of the first covariate. |
b2.true |
The true value of the second covariate. |
b3.true |
The true value of the third covariate. |
b4.true |
The true value of the fourth covariate. |
nobs |
The total number of possible observation times. |
observe.rate |
The average proportion of those possible times at which any given subject is observed. |
Value
Returns a data.frame
representing nobs
measurements for each subject. The rows of this data.frame
tell the values of two time-varying covariates on a dense grid
of nobs
observation times. It also contains an
id
variable, four subject-level covariates
(s1
, ..., s4
) and one subject-level
response (y
), which are replicated for each observation.
For each observation, there is also its observation
time time
, there are both the smooth latent value of the covariates
(true.x1
and true.x2
) and
versions observed with error (x1
and x2
), and there are
also the local values of the functional regression coefficients
(true.betafn1
and true.betafn2
). Lastly,
each row has a random value for include.in.subsample
,
telling whether it should be considered as an observed data
point (versus an unobserved moment in the simulated subject's life).
include.in.subsample
is simply generated as a Bernoulli random variable with
success probability observe.rate
.
Note
nobs
is the number of simulated data rows per
simulated subject. It
should be selected to be large because x
covariates are conceptually
supposed to be smooth functions of time. However, in the
simulated data analyses we actually only use a small random
subset of the generated time points, because this is more
realistic for many behavioral and medical science datasets.
Thus, the number of possible observation times per subject
is nobs
, and the mean number of actual observation
times per subject is nobs
times observe.rate
.
This smaller 'observed' dataset can be obtained by
deleting from the dataset those observations having
include.in.subsample==FALSE
.