eesim {eesim} | R Documentation |
Simulate data, fit models, and assess models
Description
Generates synthetic time series datasets relevant for environmental epidemiology
studies and tests performance of a model on that simulated data.
Datasets can be generated with seasonal and long-term trends in either
exposure or outcome. Binary or continuous outcomes can be simulated or incorporated
from observed datasets. The function includes extensive options for customizing each
step of the simulation process; see the eesim
vignette for more details and
examples.
Usage
eesim(n_reps, n, rr, exposure_type, custom_model, central = NULL, sd = NULL,
exposure_trend = "no trend", exposure_slope = NULL, exposure_amp = NULL,
average_outcome = NULL, outcome_trend = "no trend",
outcome_slope = NULL, outcome_amp = NULL, start.date = "2000-01-01",
cust_exp_func = NULL, cust_exp_args = NULL, cust_expdraw = NULL,
cust_expdraw_args = NULL, cust_base_func = NULL,
cust_lambda_func = NULL, cust_base_args = NULL, cust_lambda_args = NULL,
cust_outdraw = NULL, cust_outdraw_args = NULL, custom_model_args = NULL)
Arguments
n_reps |
An integer specifying the number of datasets to simulate (e.g.,
|
n |
An integer specifying the number of days to simulate (e.g., |
rr |
A non-negative numeric value specifying the relative risk (i.e., the relative risk per unit increase in the exposure). |
exposure_type |
A character string specifying the type of exposure. Choices are "binary" or "continuous". |
custom_model |
The object name of an R function that defines the code that will be used to fit the model. This object name should not be in quotations. See Details for more. |
central |
A numeric value specifying the mean probability of exposure (for binary data) or the mean exposure value (for continuous data). |
sd |
A non-negative numeric value giving the standard deviation of the exposure values from the exposure trend line (not the total standard deviation of the exposure values). |
exposure_trend |
A character string specifying a seasonal and / or long-term trend for
expected mean exposure. See the vignette for
Options for binary exposure are:
|
exposure_slope |
A numeric value specifying the linear slope of the
exposure, to be used with |
exposure_amp |
A numeric value specifying the amplitude of the exposure trend. Must be between -1 and 1 for continuous exposure or between -0.5 and 0.5 for binary exposure. Positive values will simulate a pattern with higher values at the time of the year of the start of the dataset (typically January) and lowest values six months following that (typically July). Negative values can be used to simulate a trend with lower values at the time of year of the start of the dataset and higher values in the opposite season. |
average_outcome |
A non-negative numeric value specifying the average daily outcome count. |
outcome_trend |
A character string specifying the seasonal trend in health outcomes. Options are the same as for continuous exposure data. |
outcome_slope |
A numeric value specifying the linear slope of the
outcome trend, to be used with |
outcome_amp |
A numeric value specifying the amplitude of the outcome trend. Must be between -1 and 1. |
start.date |
A date of the format "yyyy-mm-dd" from which to begin simulating daily exposures |
cust_exp_func |
An R object name specifying the name of a custom trend function to generate exposure data |
cust_exp_args |
A list of arguments and their values for the user-specified custom exposure function. |
cust_expdraw |
An R object specifying a user-created function
which determines the distribution of random noise off of the trend line.
This function must have inputs |
cust_expdraw_args |
A list of arguments other than |
cust_base_func |
A R object name specifying a user-made custom function for baseline trend. |
cust_lambda_func |
An R object name specifying a user-made custom function for relating baseline, relative risk, and exposure |
cust_base_args |
A list of arguments and their values used in the user-specified custom baseline function |
cust_lambda_args |
A list of arguments and their values used in the user-specified custom lambda function |
cust_outdraw |
An R object name specifying a user-created function to
randomize the outcome values off of the baseline for outcome values. This
function must take inputs |
cust_outdraw_args |
A list of arguments besides |
custom_model_args |
A list of arguments and their values for a custom
model. These arguments are passed through to the function specified with |
Value
A list object with three elements:
simulated_datasets
: A list of lengthn_reps
, in which each element is a data frame with one of the simulated time series datasets, created according to the specifications set by the user.indiv_performance
: A dataframe with one row per simulated dataset (i.e., total number of rows equal ton_reps
). Each row gives the results of fitting the specified model to one of the simulated datasets. Seefit_mods
for more on this output.overall_performance
: A one-row dataframe with overall performance summaries from fitting the specified model to the synthetic datasets. Seecheck_sims
for more on this output.
References
Bateson TF, Schwartz J. 1999. Control for seasonal variation and time trend in case-crossover studies of acute effects of environmental exposures. Epidemiology 10(4):539-544.
Examples
# Run a simulation for a continuous exposure (mean = 100, standard
# deviation after long-term and seasonal trends = 10) that increases
# risk of a count outcome by 0.1% per unit increase, where the average
# daily outcome is 22 per day. The exposure outcome has a seasonal trend,
# with higher values in the winter, while the outcome has no seasonal
# or long-term trends beyond those introduced through effects from the
# exposure. The simulated data are fit with a model defined by the `spline_mod`
# function (also in the `eesim` package), with its `df_year` argument set to 7.
sims <- eesim(n_reps = 3, n = 5 * 365, central = 100, sd = 10,
exposure_type = "continuous", exposure_trend = "cos3",
exposure_amp = .6, average_outcome = 22, rr = 1.001,
custom_model = spline_mod, custom_model_args = list(df_year = 7))
names(sims)
sims[[2]]
sims[[3]]