SimulateGraphical {fake} | R Documentation |
Data simulation for Gaussian Graphical Modelling
Description
Simulates data from a Gaussian Graphical Model (GGM).
Usage
SimulateGraphical(
n = 100,
pk = 10,
theta = NULL,
implementation = HugeAdjacency,
topology = "random",
nu_within = 0.1,
nu_between = NULL,
nu_mat = NULL,
v_within = c(0.5, 1),
v_between = c(0.1, 0.2),
v_sign = c(-1, 1),
continuous = TRUE,
pd_strategy = "diagonally_dominant",
ev_xx = NULL,
scale_ev = TRUE,
u_list = c(1e-10, 1),
tol = .Machine$double.eps^0.25,
scale = TRUE,
output_matrices = FALSE,
...
)
Arguments
n |
number of observations in the simulated dataset. |
pk |
vector of the number of variables per group in the simulated
dataset. The number of nodes in the simulated graph is |
theta |
optional binary and symmetric adjacency matrix encoding the conditional independence structure. |
implementation |
function for simulation of the graph. By default,
algorithms implemented in |
topology |
topology of the simulated graph. If using
|
nu_within |
probability of having an edge between two nodes belonging to
the same group, as defined in |
nu_between |
probability of having an edge between two nodes belonging
to different groups, as defined in |
nu_mat |
matrix of probabilities of having an edge between nodes
belonging to a given pair of node groups defined in |
v_within |
vector defining the (range of) nonzero entries in the
diagonal blocks of the precision matrix. These values must be between -1
and 1 if |
v_between |
vector defining the (range of) nonzero entries in the
off-diagonal blocks of the precision matrix. This argument is the same as
|
v_sign |
vector of possible signs for precision matrix entries. Possible
inputs are: |
continuous |
logical indicating whether to sample precision values from
a uniform distribution between the minimum and maximum values in
|
pd_strategy |
method to ensure that the generated precision matrix is
positive definite (and hence can be a covariance matrix). If
|
ev_xx |
expected proportion of explained variance by the first Principal
Component (PC1) of a Principal Component Analysis. This is the largest
eigenvalue of the correlation (if |
scale_ev |
logical indicating if the proportion of explained variance by
PC1 should be computed from the correlation ( |
u_list |
vector with two numeric values defining the range of values to explore for constant u. |
tol |
accuracy for the search of parameter u as defined in
|
scale |
logical indicating if the true mean is zero and true variance is one for all simulated variables. The observed mean and variance may be slightly off by chance. |
output_matrices |
logical indicating if the true precision and (partial) correlation matrices should be included in the output. |
... |
additional arguments passed to the graph simulation function
provided in |
Details
The simulation is done in two steps with (i) generation of a graph, and (ii) sampling from multivariate Normal distribution for which nonzero entries in the partial correlation matrix correspond to the edges of the simulated graph. This procedure ensures that the conditional independence structure between the variables corresponds to the simulated graph.
Step 1 is done using SimulateAdjacency
.
In Step 2, the precision matrix (inverse of the covariance matrix) is
simulated using SimulatePrecision
so that (i) its nonzero
entries correspond to edges in the graph simulated in Step 1, and (ii) it
is positive definite (see MakePositiveDefinite
). The inverse
of the precision matrix is used as covariance matrix to simulate data from
a multivariate Normal distribution.
The outputs of this function can be used to evaluate the ability of a graphical model to recover the conditional independence structure.
Value
A list with:
data |
simulated data with |
theta |
adjacency matrix of the simulated graph. |
omega |
simulated (true) precision matrix. Only returned if
|
phi |
simulated (true) partial
correlation matrix. Only returned if |
sigma |
simulated (true) covariance matrix. Only returned if
|
u |
value of the constant u used for the
simulation of |
References
Bodinier B, Filippi S, Nost TH, Chiquet J, Chadeau-Hyam M (2021). “Automated calibration for stability selection in penalised regression and graphical models: a multi-OMICs network application exploring the molecular response to tobacco smoking.” https://arxiv.org/abs/2106.02521.
See Also
SimulatePrecision
, MakePositiveDefinite
Other simulation functions:
SimulateAdjacency()
,
SimulateClustering()
,
SimulateComponents()
,
SimulateCorrelation()
,
SimulateRegression()
,
SimulateStructural()
Examples
oldpar <- par(no.readonly = TRUE)
par(mar = rep(7, 4))
# Simulation of random graph with 50 nodes
set.seed(1)
simul <- SimulateGraphical(n = 100, pk = 50, topology = "random", nu_within = 0.05)
print(simul)
plot(simul)
# Simulation of scale-free graph with 20 nodes
set.seed(1)
simul <- SimulateGraphical(n = 100, pk = 20, topology = "scale-free")
plot(simul)
# Extracting true precision/correlation matrices
set.seed(1)
simul <- SimulateGraphical(
n = 100, pk = 20,
topology = "scale-free", output_matrices = TRUE
)
str(simul)
# Simulation of multi-block data
set.seed(1)
pk <- c(20, 30)
simul <- SimulateGraphical(
n = 100, pk = pk,
pd_strategy = "min_eigenvalue"
)
mycor <- cor(simul$data)
Heatmap(mycor,
col = c("darkblue", "white", "firebrick3"),
legend_range = c(-1, 1), legend_length = 50,
legend = FALSE, axes = FALSE
)
for (i in 1:2) {
axis(side = i, at = c(0.5, pk[1] - 0.5), labels = NA)
axis(
side = i, at = mean(c(0.5, pk[1] - 0.5)),
labels = ifelse(i == 1, yes = "Group 1", no = "Group 2"),
tick = FALSE, cex.axis = 1.5
)
axis(side = i, at = c(pk[1] + 0.5, sum(pk) - 0.5), labels = NA)
axis(
side = i, at = mean(c(pk[1] + 0.5, sum(pk) - 0.5)),
labels = ifelse(i == 1, yes = "Group 2", no = "Group 1"),
tick = FALSE, cex.axis = 1.5
)
}
# User-defined function for graph simulation
CentralNode <- function(pk, hub = 1) {
theta <- matrix(0, nrow = sum(pk), ncol = sum(pk))
theta[hub, ] <- 1
theta[, hub] <- 1
diag(theta) <- 0
return(theta)
}
simul <- SimulateGraphical(n = 100, pk = 10, implementation = CentralNode)
plot(simul) # star
simul <- SimulateGraphical(n = 100, pk = 10, implementation = CentralNode, hub = 2)
plot(simul) # variable 2 is the central node
# User-defined adjacency matrix
mytheta <- matrix(c(
0, 1, 1, 0,
1, 0, 0, 0,
1, 0, 0, 1,
0, 0, 1, 0
), ncol = 4, byrow = TRUE)
simul <- SimulateGraphical(n = 100, theta = mytheta)
plot(simul)
# User-defined adjacency and block structure
simul <- SimulateGraphical(n = 100, theta = mytheta, pk = c(2, 2))
mycor <- cor(simul$data)
Heatmap(mycor,
col = c("darkblue", "white", "firebrick3"),
legend_range = c(-1, 1), legend_length = 50, legend = FALSE
)
par(oldpar)