get_IMIFA_results {IMIFA} | R Documentation |
Extract results, conduct posterior inference and compute performance metrics for MCMC samples of models from the IMIFA family
Description
This function post-processes simulations generated by mcmc_IMIFA
for any of the IMIFA family of models. This includes accounting for label switching, and accounting for rotational invariance via Procrustean methods. It can be re-ran at little computational cost in order to extract different models explored by the sampler used for sims
, without having to re-run the model itself. New results objects using different numbers of clusters and different numbers of factors (if visited by the model in question), or using different model selection criteria (if necessary) can be generated with ease. Posterior predictive checking of the appropriateness of the fitted model is also facilitated.
Usage
get_IMIFA_results(sims = NULL,
burnin = 0L,
thinning = 1L,
G = NULL,
Q = NULL,
criterion = c("bicm", "aicm", "dic", "bic.mcmc", "aic.mcmc"),
adapt = FALSE,
G.meth = c("mode", "median"),
Q.meth = c("mode", "median"),
conf.level = 0.95,
error.metrics = TRUE,
vari.rot = FALSE,
z.avgsim = FALSE,
zlabels = NULL,
nonempty = TRUE,
...)
## S3 method for class 'Results_IMIFA'
print(x,
...)
## S3 method for class 'Results_IMIFA'
summary(object,
MAP = TRUE,
...)
Arguments
sims |
An object of class |
burnin |
Optional additional number of iterations to discard. Defaults to 0, corresponding to no additional burnin. See |
thinning |
Optional interval for extra thinning to be applied. Defaults to 1, corresponding to no additional thinning. See |
G |
If this argument is not specified, results will be returned with the optimal number of clusters. If different numbers of clusters were explored in Similarly, this allows retrieval of samples corresponding to a solution, if visited, with |
Q |
If this argument is not specified, results will be returned with the optimal number of factors. If different numbers of factors were explored in Similarly, this allows retrieval of samples corresponding to a solution, if visited, with If adaptation didn't take place during model-fitting, |
criterion |
The criterion to use for model selection, where model selection is only required if more than one model was run under the Note that the first three options here might exhibit bias in favour of zero-factor models for the finite factor |
adapt |
A logical indicating if adaptation should be applied to the stored loadings and scores matrices to truncate the cluster-specific number(s) of non-redundant factors. This argument is only relevant if |
G.meth |
If the object in |
Q.meth |
If the object in |
conf.level |
The confidence level to be used throughout for credible intervals for all parameters of inferential interest, and error metrics if |
error.metrics |
A logical activating or deactivating posterior predictive checking: i.e. controlling whether metrics quantifying a) the posterior predictive reconstruction error (PPRE) between bin counts of the data and bin counts of replicate draws from the posterior distribution & and b) the error between the empirical and estimated covariance matrices should be computed. These are computed for every valid retained iteration (see The Frobenius norm is used in the computation of the PPRE, by default, but the |
vari.rot |
Logical indicating whether the loadings matrix/matrices template(s) should be |
z.avgsim |
Logical (defaults to Note that the MAP clustering is computed conditional on the estimate of the number of clusters (whether that be the modal estimate or the estimate according to Please be warned that this feature requires loading the |
zlabels |
For any method that performs clustering, the true labels can be supplied if they are known in order to compute clustering performance metrics. This also has the effect of ordering the MAP labels (and thus the ordering of cluster-specific parameters) to most closely correspond to the true labels if supplied. |
nonempty |
For |
x , object , MAP , ... |
Arguments required for the Users can also pass the When Finally, the |
Details
The function also performs post-hoc corrections for label switching, as well as post-hoc Procrustes rotation of loadings matrices and scores, in order to ensure sensible posterior parameter estimates, computes error metrics, constructs credible intervals, and generally transforms the raw sims
object into an object of class "Results_IMIFA"
in order to prepare the results for plotting via plot.Results_IMIFA
.
For the infinite factor methods, iterations where the maximum number of factors was greater than or equal to the maximum of the estimated cluster-specific factors are retained for posterior summaries of the scores, in order to preserve the estimated dimension of the scores matrices. Similarly, these are also the valid iterations used for the computation of the averages and credible intervals for the error metrics. For the finite factor models, all retained iterations are used in both instances (i.e. both for the scores and the error metrics).
In all cases, only iterations with G
non-empty components are retained.
Value
An object of class "Results_IMIFA"
to be passed to plot.Results_IMIFA
for visualising results. Dedicated print
and summary
functions also exist for objects of this class. The object, say x
, is a list of lists, the most important components of which are:
Clust |
Everything pertaining to clustering performance can be found here for all but the More detail is given if known |
Error |
Everything pertaining the model fit assessment can be found here, incl. the distribution of the PPRE values and associated bin counts for the replicate draws, as well as average error metrics (e.g. MSE, RMSE), and credible intervals quantifying the associated uncertainty, between the empirical and estimated covariance matrix/matrices, both of which are also included. |
GQ.results |
Everything pertaining to model choice can be found here, incl. posterior summaries for the estimated number of clusters and estimated number of factors, if applicable to the method employed. Model selection criterion values are also accessible here. |
Means |
Posterior summaries for the means, after conditioning on |
Loadings |
Posterior summaries for the factor loadings matrix/matrices, after conditioning on The number of iterations retained for posterior summaries of the loadings may vary for different clusters for the infinite factor methods, corresponding to iterations where the cluster-specific number of factors was greater than or equal to the modal estimate of the cluster-specific number of factors. |
Scores |
Posterior summaries for the latent factor scores, after conditioning on the maximum of the estimated number of cluster-specific factors. Summaries are given for the single matrix of factor scores. See For the infinite factor methods, iterations where the maximum number of factors was greater than or equal to the maximum of the estimated cluster-specific factors are retained for posterior summaries of the scores, in order to preserve the estimated dimension of the scores matrices. |
Uniquenesses |
Posterior summaries for the uniquenesses, after conditioning on |
The objects Means
, Loadings
, Scores
and Uniquenesses
(if stored when calling mcmc_IMIFA
!) also contain, as well as the posterior summaries, the entire chain of valid samples of each, as well as, for convenience, the last valid samples of each (after conditioning on the modal G
and Q
values, and accounting for label switching, and rotational invariance via Procrustes rotation).
Note
For the "IMIFA"
, "IMFA"
, "OMIFA"
, and "OMFA"
methods, the retained mixing proportions are renormalised after conditioning on the modal G
. This is especially necessary for the computation of the error.metrics
, just note that the values on which posterior inference are conducted will ever so slightly differ from the actually sampled values.
Due to the way the offline label-switching correction is performed, different runs of this function may give very slightly different results in terms of the cluster labellings (and by extension the parameters, which are permuted in the same way), but only if the chain was run for an extremely small number of iterations, well below the number required for convergence, and samples of the cluster labels match poorly across iterations (particularly if the number of clusters suggested by those sampled labels is high).
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
See Also
plot.Results_IMIFA
, mcmc_IMIFA
, Zsimilarity
, scores_MAP
, sim_IMIFA_model
, Procrustes
, varimax
, norm
, mgpControl
, storeControl
Examples
# data(coffee)
# data(olive)
# Run a MFA model on the coffee data over a range of clusters and factors.
# simMFAcoffee <- mcmc_IMIFA(coffee, method="MFA", range.G=2:3, range.Q=0:3, n.iters=1000)
# Accept all defaults to extract the optimal model.
# resMFAcoffee <- get_IMIFA_results(simMFAcoffee)
# Instead let's get results for a 3-cluster model, allowing Q be chosen by aic.mcmc.
# resMFAcoffee2 <- get_IMIFA_results(simMFAcoffee, G=3, criterion="aic.mcmc")
# Run an IMIFA model on the olive data, accepting all defaults.
# simIMIFAolive <- mcmc_IMIFA(olive, method="IMIFA", n.iters=10000)
# Extract optimum results
# Estimate G & Q by the median of their posterior distributions
# Construct 90% credible intervals and try to return the similarity matrix.
# resIMIFAolive <- get_IMIFA_results(simIMIFAolive, G.meth="median", Q.meth="median",
# conf.level=0.9, z.avgsim=TRUE)
# summary(resIMIFAolive)
# Simulate new data from the above model
# newdata <- sim_IMIFA_model(resIMIFAolive)
# Fit an IFA model without adaptation and examine results with and without post-hoc adaptation
# Use the "SC" criterion for determining the number of active factors
# simIFAnoadapt <- mcmc_IMIFA(olive, method="IFA", n.iters=10000, adapt=FALSE)
# resIFAnoadapt <- get_IMIFA_results(simIFAnoadapt)
# resIFApostadapt <- get_IMIFA_results(simIFAnoadapt, adapt=TRUE, active.crit="SC")
# Compare to an IFA model with adaptive Gibbs sampling
# simIFAadapt <- mcmc_IMIFA(coffee, method="IFA", n.iters=10000, active.crit="BD")
# resIFAadapt <- get_IMIFA_results(simIFAadapt)
# plot(resIFAnoadapt, "GQ")
# plot(resIFApostadapt, "GQ")
# plot(resIFAadapt, "GQ")