gumbel_max {IMIFA} | R Documentation |
Simulate Cluster Labels from Unnormalised Log-Probabilities using the Gumbel-Max Trick
Description
Samples cluster labels for N observations from G clusters efficiently using log-probabilities and the so-called Gumbel-Max trick, without requiring that the log-probabilities be normalised; thus redundant computation can be avoided.
Usage
gumbel_max(probs,
slice = FALSE)
Arguments
probs |
An N x G matrix of unnormalised probabilities on the log scale, where N is he number of observations that require labels to be sampled and G is the number of active clusters s.t. sampled labels can take values in |
slice |
A logical indicating whether or not the indicator correction for slice sampling has been applied to |
Details
Computation takes place on the log scale for stability/underflow reasons (to ensure negligible probabilities won't cause computational difficulties); in any case, many functions for calculating multivariate normal densities already output on the log scale.
Value
A vector of N sampled cluster labels, with the largest label no greater than G.
Note
Though the function is available for standalone use, note that no checks take place, in order to speed up repeated calls to the function inside mcmc_IMIFA
.
If the normalising constant is required for another reason, e.g. to compute the log-likelihood, it can be calculated by summing the output obtained by calling rowLogSumExps
on probs
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
Yellott, J. I. Jr. (1977) The relationship between Luce's choice axiom, Thurstone's theory of comparative judgment, and the double exponential distribution, Journal of Mathematical Psychology, 15(2): 109-144.
See Also
Examples
# Create weights for 3 components
G <- 3
weights <- seq_len(G)
# Call gumbel_max() repeatedly to obtain samples of the labels, zs
iters <- 10000
zs <- replicate(iters, gumbel_max(probs=log(weights)))
# Compare answer to the normalised weights
tabulate(zs, nbins=G)/iters
(normalised <- as.numeric(weights/sum(weights)))
# Simulate a matrix of Dirichlet weights & the associated vector of N labels
N <- 400
G <- 8
sizes <- seq(from=85, to=15, by=-10)
weights <- matrix(rDirichlet(N * G, alpha=1, nn=sizes), byrow=TRUE, nrow=N, ncol=G)
(zs <- gumbel_max(probs=log(weights)))