| make_one_dataset {SimTimeVar} | R Documentation |
Simulate time-varying covariates
Description
Simulates a dataset with correlated time-varying covariates with an exchangeable correlation structure. Covariates can be normal or binary and can be static within a cluster or time-varying. Time-varying normal variables can optionally have linear trajectories within each cluster.
Usage
make_one_dataset(n, obs, n.TBins, pcor, wcor, parameters, cat.parameters)
Arguments
n |
The number of clusters. |
obs |
The number of observations per cluster. |
n.TBins |
Number of time-varying binary variables. |
pcor |
The across-subject correlation matrix. See Details. |
wcor |
The within-subject correlation matrix. See Details. |
parameters |
A dataframe containing the general simulation parameters. See Details. |
cat.parameters |
A dataframe containing parameters for the categorical variables. See Details. |
Details
SPECIFYING THE PARAMETERS MATRIX
The matrix parameters contains parameters required to generate all non-categorical variables.
It must contain column names name, type, across.mean, across.SD, across.var, within.var, prop,
and error.SD. (To see an example, use data(params).) Each variable to be generated requires
either one or two rows in parameters, depending on the variable type.
The possible variable types and their corresponding specifications are:
-
Static binary variables do not change over time within a cluster. For example, if clusters are subjects, sex would be a static binary variable. Generating such a variable requires a single row of type
static.binarywithpropcorresponding to the proportion of clusters for which the variable equals 1 and all other columns set toNA. (The correct standard deviation will automatically be computed later.) For example, if the variable is an indicator for a subject's being male, thenpropspecifies the proportion of males to be generated. -
Time-varying binary variables can change within a cluster over time, as for an indicator for whether a subject is currently taking the study drug. These variables require two rows in
parameters. The first row should be of typestatic.binarywithproprepresenting the proportion of clusters for which the time-varying binary variable is 1 at least once (and all other columns set toNA). For example, this row inparameterscould represent the proportion of subjects who ever take the study drug ("ever-users").The second row should be of type
subject.propwithacross.meanrepresenting, for clusters that ever have a 1 for the binary variable, the proportion of observations within the cluster for which the variable is equal to 1. (All other columns should be set toNA.) For example, this this row inparameterscould represent the proportion of observations for which an ever-user is currently taking the drug. To indicate which pair of variables go together, thesubject.propshould have the same name as thestatic.binaryvariable, but with the suffix_sappended (for example, the former could be nameddrug_sand the latterdrug). -
Normal variables are normally distributed within a cluster such that the within-cluster means are themselves also normally distributed in the population of clusters. Generating a normal variable requires specification of the population mean (
across.mean) and standard deviation (across.SD) as well as of the within-cluster standard deviation (within.SD). To generate a static continuous variable, simply setwithin.SDto be extremely small (e.g., $1 * 10^-7$) and all corresponding correlations in matrixwcorto 0. -
Time-function variables are linear functions of time (with normal error) within each cluster such that the within-cluster baseline values are normally distributed in the population of clusters. Generating a time-function variable requires two entries. The first entry should be of type
time.functionand specifies the population mean (across.mean) and standard deviation (across.SD) of the within-cluster baseline values as well as the error standard deviation (error.SD). The second entry should be of typenormaland should have the same name as thetime.functionentry, but with the "_s" suffix. This entry specifies the mean (across.mean) and standard deviation (across.SD) of the within-cluster slopes.
SPECIFYING THE CATEGORICAL PARAMETERS MATRIX
The matrix cat.parameters contains parameters required to generate the single categorical variable,
if any.
It must contain column names level, parameter,
and beta. (To see an example, use data(cat.params).)
-
The reference level: Each categorical variable must have exactly one "reference" level. The reference level should have one row in
cat.parametersfor whichparametersis set toNAandbetais set toref. For example, in the example filecat.paramsspecifying parameters to generate a subject's race, the reference level iswhite. -
Other levels: Other levels of the categorical variable will have one or more rows. One row with parameter set to
interceptandbetaset to a numeric value represents the intercept term in the corresponding multinomial model. Any subsequent rows, with parameters set to names of other variables in the dataset andbetaset to numeric values, represents other coefficients in the corresponding multinomial models.
SPECIFYING THE POPULATION CORRELATION MATRIX
Matrix pcor specifies the population (i.e., across-cluster) correlation matrix. It should have the same
number of rows and columns as parameters as well as the same variable names and ordering of variables.
SPECIFYING THE WITHIN-CLUSTER CORRELATION MATRIX
Matrix wcor specifies the within-cluster correlation matrix. The order of the variables listed in this file should be
consistent with the order in params and pcor. However, static.binary and subject.prop variables
should not be included in wcor since they are static within a cluster. Static continuous variables should be included,
but all the correlations should be set to zero.
Examples
data = make_one_dataset(n=10, obs=10, n.TBins=2, pcor=pcor, wcor=wcor,
parameters=complete_parameters(params, n=10), cat.parameters=cat.params)$data