| sim_n_datasets {simDAG} | R Documentation |
Generate multiple datasets from a single DAG object
Description
This function takes a single DAG object and generates a list of multiple datasets, possible using parallel processing
Usage
sim_n_datasets(dag, n_sim, n_repeats, n_cores=parallel::detectCores(),
data_format="raw", data_format_args=list(),
seed=stats::runif(1), progressbar=TRUE, ...)
Arguments
dag |
A |
n_sim |
A single number specifying how many observations per dataset should be generated. |
n_repeats |
A single number specifying how many datasets should be generated. |
n_cores |
A single number specifying the amount of cores that should be used. If |
data_format |
An optional character string specifying the output format of the generated datasets. If |
data_format_args |
An optional list of named arguments passed to the function specified by |
seed |
A seed for the random number generator. By supplying a value to this argument, the results will be replicable, even if parallel processing is used to generate the datasets (using |
progressbar |
Either |
... |
Further arguments passed to the |
Details
Generating a number of datasets from a single defined dag object is usually the first step when conducting monte-carlo simulation studies. This is simply a convenience function which automates this process using parallel processing (if specified).
Note that for more complex monte-carlo simulations this function may not be ideal, because it does not allow the user to vary aspects of the data-generation mechanism inside the main for loop, because it can only handle a single dag. For example, if the user wants to simulate n_repeats datasets with confounding and n_repeats datasets without confounding, he/she has to call this function twice. This is not optimal, because setting up the clusters for parallel processing takes some processing time. If many different dags should be used, it would make more sense to write a single function that generates the dag itself for each of the desired settings. This can sadly not be automated by us though.
Value
Returns a list of length n_repeats containing datasets generated according to the supplied dag object.
Author(s)
Robin Denz
See Also
empty_dag, node, node_td, sim_from_dag, sim_discrete_time, sim2data
Examples
library(simDAG)
# some example DAG
dag <- empty_dag() +
node("death", type="binomial", parents=c("age", "sex"), betas=c(1, 2),
intercept=-10) +
node("age", type="rnorm", mean=10, sd=2) +
node("sex", parents="", type="rbernoulli", p=0.5) +
node("smoking", parents=c("sex", "age"), type="binomial",
betas=c(0.6, 0.2), intercept=-2)
# generate 10 datasets without parallel processing
out <- sim_n_datasets(dag, n_repeats=10, n_cores=1, n_sim=100)
# generate 10 datasets with parallel processing
out <- sim_n_datasets(dag, n_repeats=10, n_cores=2, n_sim=100)
# generate 10 datasets and transforming the output
# (using the sim2data function internally)
dag <- dag + node_td("CV", type="time_to_event", prob_fun=0.01)
out <- sim_n_datasets(dag, n_repeats=10, n_cores=1, n_sim=100,
max_t=20, data_format="start_stop")