SimulateComponents {fake} | R Documentation |
Data simulation for sparse Principal Component Analysis
Description
Simulates data with with independent groups of variables.
Usage
SimulateComponents(
n = 100,
pk = c(10, 10),
adjacency = NULL,
nu_within = 1,
v_within = c(0.5, 1),
v_sign = -1,
continuous = TRUE,
pd_strategy = "min_eigenvalue",
ev_xx = 0.1,
scale_ev = TRUE,
u_list = c(1e-10, 1),
tol = .Machine$double.eps^0.25,
scale = TRUE,
output_matrices = FALSE
)
Arguments
n |
number of observations in the simulated dataset. |
pk |
vector of the number of variables per group in the simulated
dataset. The number of nodes in the simulated graph is |
adjacency |
optional binary and symmetric adjacency matrix encoding the
conditional graph structure between observations. The clusters encoded in
this argument must be in line with those indicated in |
nu_within |
probability of having an edge between two nodes belonging to
the same group, as defined in |
v_within |
vector defining the (range of) nonzero entries in the
diagonal blocks of the precision matrix. These values must be between -1
and 1 if |
v_sign |
vector of possible signs for precision matrix entries. Possible
inputs are: |
continuous |
logical indicating whether to sample precision values from
a uniform distribution between the minimum and maximum values in
|
pd_strategy |
method to ensure that the generated precision matrix is
positive definite (and hence can be a covariance matrix). If
|
ev_xx |
expected proportion of explained variance by the first Principal
Component (PC1) of a Principal Component Analysis. This is the largest
eigenvalue of the correlation (if |
scale_ev |
logical indicating if the proportion of explained variance by
PC1 should be computed from the correlation ( |
u_list |
vector with two numeric values defining the range of values to explore for constant u. |
tol |
accuracy for the search of parameter u as defined in
|
scale |
logical indicating if the true mean is zero and true variance is one for all simulated variables. The observed mean and variance may be slightly off by chance. |
output_matrices |
logical indicating if the true precision and (partial) correlation matrices should be included in the output. |
Details
The data is simulated from a centered multivariate Normal distribution with a block-diagonal covariance matrix. Independence between variables from the different blocks ensures that sparse orthogonal components can be generated.
The block-diagonal partial correlation matrix is obtained using a graph structure encoding the conditional independence between variables. The orthogonal latent variables are obtained from eigendecomposition of the true correlation matrix. The sparse eigenvectors contain the weights of the linear combination of variables to construct the latent variable (loadings coefficients). The proportion of explained variance by each of the latent variable is computed from eigenvalues.
As latent variables are defined from the true correlation matrix, the
number of sparse orthogonal components is not limited by the number of
observations and is equal to sum(pk)
.
Value
A list with:
data |
simulated data with |
loadings |
loadings coefficients of the orthogonal latent variables (principal components). |
theta |
support of the loadings coefficients. |
ev |
proportion of explained variance by each of the orthogonal latent variables. |
adjacency |
adjacency matrix of the simulated graph. |
omega |
simulated (true) precision
matrix. Only returned if |
phi |
simulated
(true) partial correlation matrix. Only returned if
|
C |
simulated (true) correlation
matrix. Only returned if |
References
Bodinier B, Filippi S, Nost TH, Chiquet J, Chadeau-Hyam M (2021). “Automated calibration for stability selection in penalised regression and graphical models: a multi-OMICs network application exploring the molecular response to tobacco smoking.” https://arxiv.org/abs/2106.02521.
See Also
Other simulation functions:
SimulateAdjacency()
,
SimulateClustering()
,
SimulateCorrelation()
,
SimulateGraphical()
,
SimulateRegression()
,
SimulateStructural()
Examples
# Simulation of 3 components with high e.v.
set.seed(1)
simul <- SimulateComponents(pk = c(5, 3, 4), ev_xx = 0.4)
print(simul)
plot(simul)
plot(cumsum(simul$ev), ylim = c(0, 1), las = 1)
# Simulation of 3 components with moderate e.v.
set.seed(1)
simul <- SimulateComponents(pk = c(5, 3, 4), ev_xx = 0.25)
print(simul)
plot(simul)
plot(cumsum(simul$ev), ylim = c(0, 1), las = 1)
# Simulation of multiple components with low e.v.
pk <- sample(3:10, size = 5, replace = TRUE)
simul <- SimulateComponents(
pk = pk,
nu_within = 0.3, v_within = c(0.8, 0.5), v_sign = -1, ev_xx = 0.1
)
plot(simul)
plot(cumsum(simul$ev), ylim = c(0, 1), las = 1)