dag.sim2 {dagR} | R Documentation |
Simulate data based on a DAG.
Description
Simulates data according to a DAG object.
Usage
dag.sim2(dag, b = rep(0, nrow(dag$arc)), bxy = 0, n,
distr = rep(0, length(dag$x)),
mu = rep(0, length(dag$x)),
stdev = rep(0, length(dag$x)),
nu = NA,
lambda = NA,
binary = NA,
naming = 2, seed = NA, verbose = FALSE)
Arguments
dag |
The DAG object according to which data is to be simulated. |
b |
Vector of coefficients defining the direct effects of the DAG arcs (on linear scale). |
bxy |
Coefficient defining the direct effect of main exposure X on outcome Y (on linear scale). |
n |
Number of observations to be simulated. |
distr |
0 for Normal distribution continuous nodes, |
mu |
Vector of means that are to be simulated for the different DAG nodes: |
stdev |
Vector of standard deviations for each node. |
nu |
Not used. |
lambda |
Not used. |
binary |
For backwards compatibility: Vector indicating which nodes are to be continuous (=0) and binary (=1). If given, this is passed to argument "distr" and a warning is issued. |
naming |
If =2, the alternative DAG node symbols are used for naming the variables in the output dataframe. Otherwise, the output dataframe variables are named X1...Xn. |
seed |
Seed to initialize the random number generator. |
verbose |
If =TRUE, additional output is given during the simulation, in particular showing the different calculation steps. |
Value
A dataframe with n (rows) observations featuring simulated data for each node (columns) in the DAG.
Simulation steps:
1. simulate data for nodes i without ancestors, drawing from Normal distribution with mean mu[i] and stdev[i]
(continuous node), or drawing from Bernoulli events with probability mu[i] (binary node).
2. simulate data for nodes i for which all ancestors already have been simulated by multiplying the ancestor values
with the corresponding arc coefficients and summing them up, shifting the resulting values to the mean mu[i] (exceptions: distr=1.1 or
distr=2.1, as detailed in "mu" above) specified for the
currently simulated node (logit-transformed if binary based on logistic model), then adding noise drawn from a Normal distribution with mean 0
and standard deviation stdev[i], finally using the resulting values (inverse logit, if binary based on logistic model) as success probabilities
for simulating binary data if node is binary.
As the noise is added after shifting to the mean, the mean of the simulated data will not be exact. Also, the noise is added before calculating descendant nodes, i.e. it is sort of true inter-individual variation, rather than measurement error.
For the risk difference model, the success probability calculated by summing the weighted ancestors can easily be <0 (or >1).
If this happens, the probability is set to 0 (or 1), and a warning is issued.
Note
Undirected arcs are ignored in these simulations.
Author(s)
Lutz P Breitling <l.breitling@posteo.de>
References
Breitling LP, Duan C, Dragomir AD, Luta G (2022). Using dagR to identify minimal sufficient adjustment sets and
to simulate data based on directed acyclic graphs. Int J Epidemiol 50(6):1772-1777.
Duan C, Dragomir AD, Luta G, Breitling LP (2022). Reflection on modern methods: Understanding bias and data
analytical strategies through DAG-based data simulations. Int J Epidemiol 50(6):2091-2097.