SimulateStructural {fake} | R Documentation |
Data simulation for Structural Causal Modelling
Description
Simulates data from a multivariate Normal distribution where relationships between the variables correspond to a Structural Causal Model (SCM). To ensure that the generated SCM is identifiable, the nodes are organised by layers, with no causal effects within layers.
Usage
SimulateStructural(
n = 100,
pk = c(5, 5, 5),
theta = NULL,
n_manifest = NULL,
nu_between = 0.5,
v_between = c(0.5, 1),
v_sign = c(-1, 1),
continuous = TRUE,
ev = 0.5,
ev_manifest = 0.8,
output_matrices = FALSE
)
Arguments
n |
number of observations in the simulated dataset. |
pk |
vector of the number of (latent) variables per layer. |
theta |
optional binary adjacency matrix of the Directed Acyclic Graph
(DAG) of causal relationships. This DAG must have a structure with layers
so that a variable can only be a parent of variable in one of the following
layers (see |
n_manifest |
vector of the number of manifest (observed) variables
measuring each of the latent variables. If |
nu_between |
probability of having an edge between two nodes belonging
to different layers, as defined in |
v_between |
vector defining the (range of) nonzero path coefficients. If
|
v_sign |
vector of possible signs for path coefficients. Possible inputs
are: |
continuous |
logical indicating whether to sample path coefficients from
a uniform distribution between the minimum and maximum values in
|
ev |
vector of proportions of variance in each of the (latent) variables
that can be explained by its parents. If there are no latent variables (if
|
ev_manifest |
vector of proportions of variance in each of the manifest
variable that can be explained by its latent parent. Only used if
|
output_matrices |
logical indicating if the true path coefficients, residual variances, and precision and (partial) correlation matrices should be included in the output. |
Value
A list with:
data |
simulated data with |
theta |
adjacency matrix of the simulated Directed Acyclic Graph encoding causal relationships. |
Amat |
simulated (true) asymmetric matrix A in RAM notation. Only
returned if |
Smat |
simulated (true)
symmetric matrix S in RAM notation. Only returned if
|
Fmat |
simulated (true) filter matrix F
in RAM notation. Only returned if |
sigma |
simulated (true) covariance matrix. Only returned if
|
References
Jacobucci R, Grimm KJ, McArdle JJ (2016). “Regularized structural equation modeling.” Structural equation modeling: a multidisciplinary journal, 23(4), 555–566. doi:10.1080/10705511.2016.1154793.
See Also
SimulatePrecision
, MakePositiveDefinite
,
Contrast
Other simulation functions:
SimulateAdjacency()
,
SimulateClustering()
,
SimulateComponents()
,
SimulateCorrelation()
,
SimulateGraphical()
,
SimulateRegression()
Examples
# Simulation of a layered SCM
set.seed(1)
pk <- c(3, 5, 4)
simul <- SimulateStructural(n = 100, pk = pk)
print(simul)
summary(simul)
plot(simul)
# Choosing the proportions of explained variances for endogenous variables
set.seed(1)
simul <- SimulateStructural(
n = 1000,
pk = c(2, 3),
nu_between = 1,
ev = c(NA, NA, 0.5, 0.7, 0.9),
output_matrices = TRUE
)
# Checking expected proportions of explained variances
1 - simul$Smat["x3", "x3"] / simul$sigma["x3", "x3"]
1 - simul$Smat["x4", "x4"] / simul$sigma["x4", "x4"]
1 - simul$Smat["x5", "x5"] / simul$sigma["x5", "x5"]
# Checking observed proportions of explained variances (R-squared)
summary(lm(simul$data[, 3] ~ simul$data[, which(simul$theta[, 3] != 0)]))
summary(lm(simul$data[, 4] ~ simul$data[, which(simul$theta[, 4] != 0)]))
summary(lm(simul$data[, 5] ~ simul$data[, which(simul$theta[, 5] != 0)]))
# Simulation including latent and manifest variables
set.seed(1)
simul <- SimulateStructural(
n = 100,
pk = c(2, 3),
n_manifest = c(2, 3, 2, 1, 2)
)
plot(simul)
# Showing manifest variables in red
if (requireNamespace("igraph", quietly = TRUE)) {
mygraph <- plot(simul)
ids <- which(igraph::V(mygraph)$name %in% colnames(simul$data))
igraph::V(mygraph)$color[ids] <- "red"
igraph::V(mygraph)$frame.color[ids] <- "red"
plot(mygraph)
}
# Choosing proportions of explained variances for latent and manifest variables
set.seed(1)
simul <- SimulateStructural(
n = 100,
pk = c(3, 2),
n_manifest = c(2, 3, 2, 1, 2),
ev = c(NA, NA, NA, 0.7, 0.9),
ev_manifest = 0.8,
output_matrices = TRUE
)
plot(simul)
# Checking expected proportions of explained variances
(simul$sigma_full["f4", "f4"] - simul$Smat["f4", "f4"]) / simul$sigma_full["f4", "f4"]
(simul$sigma_full["f5", "f5"] - simul$Smat["f5", "f5"]) / simul$sigma_full["f5", "f5"]
(simul$sigma_full["x1", "x1"] - simul$Smat["x1", "x1"]) / simul$sigma_full["x1", "x1"]