R: Simulation of a correlation matrix

SimulateCorrelation {fake}

R Documentation

Simulation of a correlation matrix

Description

Simulates a correlation matrix. This is done in three steps with (i) the simulation of an undirected graph encoding conditional independence, (ii) the simulation of a (positive definite) precision matrix given the graph, and (iii) the re-scaling of the inverse of the precision matrix.

Usage

SimulateCorrelation(
  pk = 10,
  theta = NULL,
  implementation = HugeAdjacency,
  topology = "random",
  nu_within = 0.1,
  nu_between = NULL,
  nu_mat = NULL,
  v_within = c(0.5, 1),
  v_between = c(0.1, 0.2),
  v_sign = c(-1, 1),
  continuous = TRUE,
  pd_strategy = "diagonally_dominant",
  ev_xx = NULL,
  scale_ev = TRUE,
  u_list = c(1e-10, 1),
  tol = .Machine$double.eps^0.25,
  output_matrices = FALSE,
  ...
)

Arguments

`pk`	vector of the number of variables per group in the simulated dataset. The number of nodes in the simulated graph is `sum(pk)`. With multiple groups, the simulated (partial) correlation matrix has a block structure, where blocks arise from the integration of the `length(pk)` groups. This argument is only used if `theta` is not provided.
`theta`	optional binary and symmetric adjacency matrix encoding the conditional independence structure.
`implementation`	function for simulation of the graph. By default, algorithms implemented in `huge.generator` are used. Alternatively, a user-defined function can be used. It must take `pk`, `topology` and `nu` as arguments and return a `(sum(pk)*(sum(pk)))` binary and symmetric matrix for which diagonal entries are all equal to zero. This function is only applied if `theta` is not provided.
`topology`	topology of the simulated graph. If using `implementation=HugeAdjacency`, possible values are listed for the argument `graph` of `huge.generator`. These are: "random", "hub", "cluster", "band" and "scale-free".
`nu_within`	probability of having an edge between two nodes belonging to the same group, as defined in `pk`. If `length(pk)=1`, this is the expected density of the graph. If `implementation=HugeAdjacency`, this argument is only used for `topology="random"` or `topology="cluster"` (see argument `prob` in `huge.generator`). Only used if `nu_mat` is not provided.
`nu_between`	probability of having an edge between two nodes belonging to different groups, as defined in `pk`. By default, the same density is used for within and between blocks (`nu_within`=`nu_between`). Only used if `length(pk)>1`. Only used if `nu_mat` is not provided.
`nu_mat`	matrix of probabilities of having an edge between nodes belonging to a given pair of node groups defined in `pk`.
`v_within`	vector defining the (range of) nonzero entries in the diagonal blocks of the precision matrix. These values must be between -1 and 1 if `pd_strategy="min_eigenvalue"`. If `continuous=FALSE`, `v_within` is the set of possible precision values. If `continuous=TRUE`, `v_within` is the range of possible precision values.
`v_between`	vector defining the (range of) nonzero entries in the off-diagonal blocks of the precision matrix. This argument is the same as `v_within` but for off-diagonal blocks. It is only used if `length(pk)>1`.
`v_sign`	vector of possible signs for precision matrix entries. Possible inputs are: `-1` for positive partial correlations, `1` for negative partial correlations, or `c(-1, 1)` for both positive and negative partial correlations.
`continuous`	logical indicating whether to sample precision values from a uniform distribution between the minimum and maximum values in `v_within` (diagonal blocks) or `v_between` (off-diagonal blocks) (if `continuous=TRUE`) or from proposed values in `v_within` (diagonal blocks) or `v_between` (off-diagonal blocks) (if `continuous=FALSE`).
`pd_strategy`	method to ensure that the generated precision matrix is positive definite (and hence can be a covariance matrix). If `pd_strategy="diagonally_dominant"`, the precision matrix is made diagonally dominant by setting the diagonal entries to the sum of absolute values on the corresponding row and a constant u. If `pd_strategy="min_eigenvalue"`, diagonal entries are set to the sum of the absolute value of the smallest eigenvalue of the precision matrix with zeros on the diagonal and a constant u.
`ev_xx`	expected proportion of explained variance by the first Principal Component (PC1) of a Principal Component Analysis. This is the largest eigenvalue of the correlation (if `scale_ev=TRUE`) or covariance (if `scale_ev=FALSE`) matrix divided by the sum of eigenvalues. If `ev_xx=NULL` (the default), the constant u is chosen by maximising the contrast of the correlation matrix.
`scale_ev`	logical indicating if the proportion of explained variance by PC1 should be computed from the correlation (`scale_ev=TRUE`) or covariance (`scale_ev=FALSE`) matrix. If `scale_ev=TRUE`, the correlation matrix is used as parameter of the multivariate normal distribution.
`u_list`	vector with two numeric values defining the range of values to explore for constant u.
`tol`	accuracy for the search of parameter u as defined in `optimise`.
`output_matrices`	logical indicating if the true precision and (partial) correlation matrices should be included in the output.
`...`	additional arguments passed to the graph simulation function provided in `implementation`.

Details

In Step 1, the conditional independence structure between the variables is simulated. This is done using SimulateAdjacency.

In Step 2, the precision matrix is simulated using SimulatePrecision so that (i) its nonzero entries correspond to edges in the graph simulated in Step 1, and (ii) it is positive definite (see MakePositiveDefinite).

In Step 3, the covariance is calculated as the inverse of the precision matrix. The correlation matrix is then obtained by re-scaling the covariance matrix (see cov2cor).

Value

A list with:

`sigma`	simulated correlation matrix.
`omega`	simulated precision matrix. Only returned if `output_matrices=TRUE`.
`theta`	adjacency matrix of the simulated graph. Only returned if `output_matrices=TRUE`.

Examples

oldpar <- par(no.readonly = TRUE)
par(mar = rep(7, 4))

# Random correlation matrix
set.seed(1)
simul <- SimulateCorrelation(pk = 10)
Heatmap(simul$sigma,
  col = c("navy", "white", "darkred"),
  text = TRUE, format = "f", digits = 2,
  legend_range = c(-1, 1)
)

# Correlation matrix with homogeneous block structure
set.seed(1)
simul <- SimulateCorrelation(
  pk = c(5, 5),
  nu_within = 1,
  nu_between = 0,
  v_sign = -1,
  v_within = 1
)
Heatmap(simul$sigma,
  col = c("navy", "white", "darkred"),
  text = TRUE, format = "f", digits = 2,
  legend_range = c(-1, 1)
)

# Correlation matrix with heterogeneous block structure
set.seed(1)
simul <- SimulateCorrelation(
  pk = c(5, 5),
  nu_within = 0.5,
  nu_between = 0,
  v_sign = -1
)
Heatmap(simul$sigma,
  col = c("navy", "white", "darkred"),
  text = TRUE, format = "f", digits = 2,
  legend_range = c(-1, 1)
)

par(oldpar)

[Package fake version 1.4.0 Index]