sim_n_datasets {simDAG} | R Documentation |
Generate multiple datasets from a single DAG
object
Description
This function takes a single DAG
object and generates a list of multiple datasets, possible using parallel processing
Usage
sim_n_datasets(dag, n_sim, n_repeats, n_cores=parallel::detectCores(),
data_format="raw", data_format_args=list(),
seed=stats::runif(1), progressbar=TRUE, ...)
Arguments
dag |
A |
n_sim |
A single number specifying how many observations per dataset should be generated. |
n_repeats |
A single number specifying how many datasets should be generated. |
n_cores |
A single number specifying the amount of cores that should be used. If |
data_format |
An optional character string specifying the output format of the generated datasets. If |
data_format_args |
An optional list of named arguments passed to the function specified by |
seed |
A seed for the random number generator. By supplying a value to this argument, the results will be replicable, even if parallel processing is used to generate the datasets (using |
progressbar |
Either |
... |
Further arguments passed to the |
Details
Generating a number of datasets from a single defined dag
object is usually the first step when conducting monte-carlo simulation studies. This is simply a convenience function which automates this process using parallel processing (if specified).
Note that for more complex monte-carlo simulations this function may not be ideal, because it does not allow the user to vary aspects of the data-generation mechanism inside the main for loop, because it can only handle a single dag
. For example, if the user wants to simulate n_repeats
datasets with confounding and n_repeats
datasets without confounding, he/she has to call this function twice. This is not optimal, because setting up the clusters for parallel processing takes some processing time. If many different dag
s should be used, it would make more sense to write a single function that generates the dag
itself for each of the desired settings. This can sadly not be automated by us though.
Value
Returns a list of length n_repeats
containing datasets generated according to the supplied dag
object.
Author(s)
Robin Denz
See Also
empty_dag
, node
, node_td
, sim_from_dag
, sim_discrete_time
, sim2data
Examples
library(simDAG)
# some example DAG
dag <- empty_dag() +
node("death", type="binomial", parents=c("age", "sex"), betas=c(1, 2),
intercept=-10) +
node("age", type="rnorm", mean=10, sd=2) +
node("sex", parents="", type="rbernoulli", p=0.5) +
node("smoking", parents=c("sex", "age"), type="binomial",
betas=c(0.6, 0.2), intercept=-2)
# generate 10 datasets without parallel processing
out <- sim_n_datasets(dag, n_repeats=10, n_cores=1, n_sim=100)
# generate 10 datasets with parallel processing
out <- sim_n_datasets(dag, n_repeats=10, n_cores=2, n_sim=100)
# generate 10 datasets and transforming the output
# (using the sim2data function internally)
dag <- dag + node_td("CV", type="time_to_event", prob_fun=0.01)
out <- sim_n_datasets(dag, n_repeats=10, n_cores=1, n_sim=100,
max_t=20, data_format="start_stop")