simulateInterventions {CompareCausalNetworks} | R Documentation |
Simulate data of a causal (possibly cyclic model) under interventions.
Description
Simulate data of a causal (possibly cyclic model) under interventions.
Usage
simulateInterventions(
n,
p,
df,
rhoNoise,
snrPar,
sparse,
doInterv,
numberInt,
strengthInt,
cyclic,
strengthCycle,
modelMis = FALSE,
modelMisPar = 1,
seed = 1
)
Arguments
n |
Number of observations. |
p |
Number of variables. |
df |
Degrees of freedom in t-distribution of noise and interventions. |
rhoNoise |
Correlation between noise terms to model hidden variabkes. Set to 0 for independent noise. |
snrPar |
Signal-to-noise parameter: steers what proportion of the variance stems from
the signal resp.\ from the noise: The SNR is given by $SNR = (1- |
sparse |
Probability that an entry |
doInterv |
Set to TRUE if interventions should be do-interventions; otherwise noise interventions (also called shift interventions) are generated. |
numberInt |
Total number of settings. |
strengthInt |
Regulates the strength of the interventions, see details. |
cyclic |
Set to TRUE is resulting graph should contain a cycle. |
strengthCycle |
Steers strength of feedback, see details. |
modelMis |
Add a model misspecification that applies |
modelMisPar |
Parameter steering the strength of the model misspecification. |
seed |
Random seed. |
Details
The adjacency matrix is generated as follows. Assume the variables
with indices
are causally ordered. For each edge from node
to node
where
precedes
in the causal ordering,
we draw a sample from Bin(
sparse
) to determine whether to add an edge
from node to node
. After having sampled the non-zero entries
of
in this fashion, we sample the coefficients from Unif(-1,1).
As described below, the edge weights are later rescaled to achieve a specified
signal-to-noise ratio. We exclude the possibility of
,
i.e. we resample until
contains at least one non-zero entry.
Second, the interventions are generated as follows. numberInt
denotes the total
number of (interventional and observational) settings that are generated.
For each variable, we sample uniformly at random with replacement one setting
in which this variable is intervened on. In other words, each variable is
intervened on in exactly one setting. Hence it is possible that there are
settings where no interventions take place which then correspond to the
observational case. Similarly, there may be settings where interventions
are performed on multiple variables at once. After defining the settings,
we sample (uniformly at random with replacement) what setting each data point
belongs to. So for each setting we generate approximately the same number of
samples. In one generated data set, the interventions are all of the same
type, i.e. they are either all shift interventions (when doInterv = FALSE
)
or do-interventions (when doInterv = TRUE
). In both cases, an intervention
on is modelled by generating
as
strengthInt
(
dfNoise
).
If strengthInt
= 0, all interventional settings correspond to purely
observational data.
Third, the noise terms are generated by first sampling from
where
and
rhoNoise
. To steer the signal-to-noise ratio,
we set the variance of the noise terms of all nodes except source nodes
to snrPar
where snrPar
. Stepping through the
variables in causal order, for each variable
that has parents, we
uniformly rescale the edge weights
for
in the structural equation of variable
such that the variance of
the sum
is approximately
1 in the observational setting. In other words, the parameter
snrPar
steers what proportion of the variance stems from the signal given by
and what proportion stems from the
noise
. The signal-to-noise ratio can then be computed
as SNR = (1-
snrPar
)/snrPar
.
Forth, a cycle is added to the causal graph if cyclic = TRUE
. If the
causal graph shall contain a cycle, we sample two nodes and
such that adding an edge between them creates a cycle in the causal graph.
We then compute the largest possible coefficient for this edge such that the
cycle product is smaller than 1. Subsequently, we sample the sign of the
coefficient and set the magnitude by scaling the largest possible coefficient
by
strengthCycle
where strengthCycle
.
Fifth, we rescale the noise variables to obtain a -distribution with
dfNoise
degrees of freedom. is then generated as
in the observational case; under a shift
interventions
can be generated as
where the coordinates of
are only non-zero for the variables
that are intervened on. Under a do-intervention on
,
for
are set to 0 to yield
and
is set to
to yield
. We then obtain
as
.
Lastly, if modelMis = TRUE
a model misspecification is added to the
data by marginally transforming all variables as tanh(modelMisPar*x)/modelMisPar)
.
Value
A list with the following elements:
-
X
-dimensional data matrix
-
environment
Indicator of the experiment or the intervention type an observation belongs to. A numeric vector of length.
-
interventions
A list of length. Indicates location of interventions for each data point.
-
whereInt
A list of lengthnumberInt
. Indicates location of interventions in each setting. -
noise
-
configs
A list with the generated adjacency matrix (trueA
) as well as all input arguments.