sim_IMIFA {IMIFA} | R Documentation |
Simulate Data from a Mixture of Factor Analysers Structure
Description
Functions to simulate data of any size and dimension from a (infinite) mixture of (infinite) factor analysers parameterisation or fitted object.
Usage
sim_IMIFA_data(N = 300L,
G = 3L,
P = 50L,
Q = rep(floor(log(P)), G),
pis = rep(1/G, G),
mu = NULL,
psi = NULL,
loadings = NULL,
scores = NULL,
nn = NULL,
loc.diff = 2,
non.zero = P,
forceQg = TRUE,
method = c("conditional", "marginal"))
sim_IMIFA_model(res,
method = c("conditional", "marginal"))
Arguments
N , G , P |
Desired overall number of observations, number of clusters, and number of variables in the simulated data set. All must be a single integer. |
Q |
Desired number of cluster-specific latent factors in the simulated data set. Can be specified either as a single integer if all clusters are to have the same number of factors, or a vector of length |
pis |
Mixing proportions of the clusters in the data set if |
mu |
True values of the mean parameters, either as a single value, a vector of length |
psi |
True values of uniqueness parameters, either as a single value, a vector of length |
loadings |
True values of the loadings matrix/matrices. Must be supplied in the form of a list of numeric matrices when |
scores |
True values of the latent factor scores, as a |
nn |
An alternative way to specify the size of each cluster, by giving the exact number of observations in each cluster explicitly. Must sum to |
loc.diff |
A parameter to control the closeness of the clusters in terms of the difference in their location vectors. Only relevant if More specifically,
|
non.zero |
Controls the number of non-zero entries in each loadings column (per cluster) only when Must be given as a list of length |
forceQg |
A logical indicating whether the upper limit on the number of cluster-specific factors |
method |
A switch indicating whether the mixture to be simulated from is the conditional distribution of the data given the latent variables (default), or simply the marginal distribution of the data. |
res |
An object of class |
Details
sim_IMIFA_model
is a simple wrapper to sim_IMIFA_data
which uses the estimated parameters of a fitted IMIFA related model, as generated by get_IMIFA_results
. The necessary parameters must have been originally stored via storeControl
in the creation of res
.
Value
Invisibly returns a data.frame
with N
observations (rows) of P
variables (columns). The true values of the parameters which generated these data are also stored as attributes.
Note
N
, G
, P
& Q
will NOT be inferred from the supplied parameters pis
, mu
, psi
, loadings
, scores
& nn
- rather, the parameters' length/dimensions must adhere to the supplied values of N
, G
, P
& Q
.
Missing values are not allowed in any of pis
, mu
, psi
, loadings
, scores
& nn
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
See Also
mcmc_IMIFA
for fitting an IMIFA related model to the simulated data set.
get_IMIFA_results
for generating input for sim_IMIFA_model
.
Ledermann
for details on the upper-bound for Q
. Note that this function accounts for isotropic uniquenesses, if psi
is supplied in that manner, in computing this bound.
Examples
# Simulate 100 observations from 3 balanced clusters with cluster-specific numbers of latent factors
# Specify isotropic uniquenesses within each cluster
# Supply cluster means directly
sim_data <- sim_IMIFA_data(N=100, G=3, P=20, Q=c(2, 2, 5), psi=1:3,
mu=matrix(rnorm(60, -2 + 1:3, 1), nrow=20, ncol=3, byrow=TRUE))
names(attributes(sim_data))
labels <- attr(sim_data, "Labels")
# Visualise the data in two-dimensions
plot(cmdscale(dist(sim_data), k=2), col=labels)
# Examine the overlap with a pairs plot of 5 randomly chosen variables
pairs(sim_data[,sample(1:20, 5)], col=labels)
# Fit a MIFA model to this data
# tmp <- mcmc_IMIFA(sim_data, method="MIFA", range.G=3, n.iters=5000)
# Simulate from this model
# res <- get_IMIFA_results(tmp, zlabels=labels)
# sim_mod <- sim_IMIFA_model(res)