piv_MCMC {pivmet}R Documentation

JAGS/Stan Sampling for Gaussian Mixture Models and Clustering via Co-Association Matrix.

Description

Perform MCMC JAGS sampling or HMC Stan sampling for Gaussian mixture models, post-process the chains and apply a clustering technique to the MCMC sample. Pivotal units for each group are selected among four alternative criteria.

Usage

piv_MCMC(
  y,
  k,
  nMC,
  priors,
  piv.criterion = c("MUS", "maxsumint", "minsumnoint", "maxsumdiff"),
  clustering = c("diana", "hclust"),
  software = c("rjags", "rstan"),
  burn = 0.5 * nMC,
  chains = 4,
  cores = 1,
  sparsity = FALSE
)

Arguments

y

NN-dimensional vector for univariate data or N×DN \times D matrix for multivariate data.

k

Number of mixture components.

nMC

Number of MCMC iterations for the JAGS/Stan function execution.

priors

Input prior hyperparameters (see Details for default options).

piv.criterion

The pivotal criterion used for identifying one pivot for each group. Possible choices are: "MUS", "maxsumint", "minsumnoint", "maxsumdiff". The default method is "maxsumint" (see the Details and the vignette).

clustering

The algorithm adopted for partitioning the NN observations into k groups. Possible choices are "diana" (default) or "hclust" for divisive and agglomerative hierarchical clustering, respectively.

software

The selected MCMC method to fit the model: "rjags" for the JAGS method, "rstan" for the Stan method. Default is "rjags".

burn

The burn-in period (only if method "rjags" is selected). Default is 0.5×\times nMC.

chains

A positive integer specifying the number of Markov chains. The default is 4.

cores

The number of cores to use when executing the Markov chains in parallel (only if software="rstan"). Default is 1.

sparsity

Allows for sparse finite mixtures, default is FALSE.

Details

The function fits univariate and multivariate Bayesian Gaussian mixture models of the form (here for univariate only):

(YiZi=j)N(μj,σj),(Y_i|Z_i=j) \sim \mathcal{N}(\mu_j,\sigma_j),

where the ZiZ_i, i=1,,Ni=1,\ldots,N, are i.i.d. random variables, j=1,,kj=1,\dots,k, σj\sigma_j is the group variance, Zi1,,kZ_i \in {1,\ldots,k } are the latent group allocation, and

P(Zi=j)=ηj.P(Z_i=j)=\eta_j.

The likelihood of the model is then

L(y;μ,η,σ)=i=1Nj=1kηjN(μj,σj),L(y;\mu,\eta,\sigma) = \prod_{i=1}^N \sum_{j=1}^k \eta_j \mathcal{N}(\mu_j,\sigma_j),

where (μ,σ)=(μ1,,μk,σ1,,σk)(\mu, \sigma)=(\mu_{1},\dots,\mu_{k},\sigma_{1},\ldots,\sigma_{k}) are the component-specific parameters and η=(η1,,ηk)\eta=(\eta_{1},\dots,\eta_{k}) the mixture weights. Let ν\nu denote a permutation of 1,,k{ 1,\ldots,k }, and let ν(μ)=(μν(1),,\nu(\mu)= (\mu_{\nu(1)},\ldots, μν(k)) \mu_{\nu(k)}), ν(σ)=(σν(1),,\nu(\sigma)= (\sigma_{\nu(1)},\ldots, σν(k)) \sigma_{\nu(k)}), ν(η)=(ην(1),,ην(k)) \nu(\eta)=(\eta_{\nu(1)},\ldots,\eta_{\nu(k)}) be the corresponding permutations of μ\mu, σ\sigma and η\eta. Denote by VV the set of all the permutations of the indexes 1,,k{1,\ldots,k }, the likelihood above is invariant under any permutation νV\nu \in V, that is

L(y;μ,η,σ)=L(y;ν(μ),ν(η),ν(σ)). L(y;\mu,\eta,\sigma) = L(y;\nu(\mu),\nu(\eta),\nu(\sigma)).

As a consequence, the model is unidentified with respect to an arbitrary permutation of the labels. When Bayesian inference for the model is performed, if the prior distribution p0(μ,η,σ)p_0(\mu,\eta,\sigma) is invariant under a permutation of the indices, then so is the posterior. That is, if p0(μ,η,σ)=p0(ν(μ),ν(η),σ)p_0(\mu,\eta,\sigma) = p_0(\nu(\mu),\nu(\eta),\sigma), then

p(μ,η,σy)p0(μ,η,σ)L(y;μ,η,σ) p(\mu,\eta,\sigma| y) \propto p_0(\mu,\eta,\sigma)L(y;\mu,\eta,\sigma)

is multimodal with (at least) k!k! modes.

Depending on the selected software, the model parametrization changes in terms of the prior choices. Precisely, the JAGS philosophy with the underlying Gibbs sampling is to use noninformative priors, and conjugate priors are preferred for computational speed. Conversely, Stan adopts weakly informative priors, with no need to explicitly use the conjugacy. For univariate mixtures, when software="rjags" the specification is the same as the function BMMmodel of the bayesmix package:

μjN(μ0,1/B0inv)\mu_j \sim \mathcal{N}(\mu_0, 1/B0inv)

σj\mboxinvGamma(nu0Half,nu0S0Half)\sigma_j \sim \mbox{invGamma}(nu0Half, nu0S0Half)

η\mboxDirichlet(1,,1)\eta \sim \mbox{Dirichlet}(1,\ldots,1)

S0\mboxGamma(g0Half,g0G0Half),S0 \sim \mbox{Gamma}(g0Half, g0G0Half),

with default values: μ0=0,B0inv=0.1,nu0Half=10,S0=2,nu0S0Half=nu0Half×S0,g0Half=5e17,g0G0Half=5e33\mu_0=0, B0inv=0.1, nu0Half =10, S0=2, nu0S0Half= nu0Half\times S0, g0Half = 5e-17, g0G0Half = 5e-33, in accordance with the default specification:

priors=list(kind = "independence", parameter = "priorsFish", hierarchical = "tau")

(see bayesmix for further details and choices).

When software="rstan", the prior specification is:

μjN(μ0,1/B0inv)\mu_j \sim \mathcal{N}(\mu_0, 1/B0inv)

σj\mboxLognormal(μσ,τσ)\sigma_j \sim \mbox{Lognormal}(\mu_{\sigma}, \tau_{\sigma})

ηj\mboxUniform(0,1),\eta_j \sim \mbox{Uniform}(0,1),

with default values: μ0=0,B0inv=0.1,μσ=0,τσ=2\mu_0=0, B0inv=0.1, \mu_{\sigma}=0, \tau_{\sigma}=2. The users may specify new hyperparameter values with the argument:

priors=list(mu_0=1, B0inv=0.2, mu_sigma=3, tau_sigma=5)

For multivariate mixtures, when software="rjags" the prior specification is the following:

μjND(μ0,S2) \bm{\mu}_j \sim \mathcal{N}_D(\bm{\mu}_0, S2)

Σ1\mboxWishart(S3,D+1) \Sigma^{-1} \sim \mbox{Wishart}(S3, D+1)

η\mboxDirichlet(α),\eta \sim \mbox{Dirichlet}(\bm{\alpha}),

where α\bm{\alpha} is a kk-dimensional vector and S2S_2 and S3S_3 are positive definite matrices. By default, μ0=0\bm{\mu}_0=\bm{0}, α=(1,,1)\bm{\alpha}=(1,\ldots,1) and S2S_2 and S3S_3 are diagonal matrices, with diagonal elements equal to 1e+05. The user may specify other values for the hyperparameters μ0,S2,S3\bm{\mu}_0, S_2, S_3 and α\bm{\alpha} via priors argument in such a way:

priors =list(mu_0 = c(1,1), S2 = ..., S3 = ..., alpha = ...)

with the constraint for S2S2 and S3S3 to be positive definite, and α\bm{\alpha} a vector of dimension kk with nonnegative elements.

When software="rstan", the prior specification is:

μjND(μ0,LDLT) \bm{\mu}_j \sim \mathcal{N}_D(\bm{\mu}_0, LD*L^{T})

L\mboxLKJ(ϵ)L \sim \mbox{LKJ}(\epsilon)

Dj\mboxHalfCauchy(0,σd).D^*_j \sim \mbox{HalfCauchy}(0, \sigma_d).

The covariance matrix is expressed in terms of the LDL decomposition as LDLTLD*L^{T}, a variant of the classical Cholesky decomposition, where LL is a D×DD \times D lower unit triangular matrix and DD* is a D×DD \times D diagonal matrix. The Cholesky correlation factor LL is assigned a LKJ prior with ϵ\epsilon degrees of freedom, which, combined with priors on the standard deviations of each component, induces a prior on the covariance matrix; as ϵ\epsilon \rightarrow \infty the magnitude of correlations between components decreases, whereas ϵ=1\epsilon=1 leads to a uniform prior distribution for LL. By default, the hyperparameters are μ0=0\bm{\mu}_0=\bm{0}, σd=2.5,ϵ=1\sigma_d=2.5, \epsilon=1. The user may propose some different values with the argument:

priors=list(mu_0=c(1,2), sigma_d = 4, epsilon =2)

If software="rjags" the function performs JAGS sampling using the bayesmix package for univariate Gaussian mixtures, and the runjags package for multivariate Gaussian mixtures. If software="rstan" the function performs Hamiltonian Monte Carlo (HMC) sampling via the rstan package (see the vignette and the Stan project for any help).

After MCMC sampling, this function clusters the units in k groups, calls the piv_sel() function and yields the pivots obtained from one among four different methods (the user may specify one among them via piv.criterion argument): "maxsumint", "minsumnoint", "maxsumdiff" and "MUS" (available only if k <= 4) (see the vignette for thorough details). Due to computational reasons clarified in the Details section of the function piv_rel, the length of the MCMC chains will be minor or equal than the input argument nMC; this length, corresponding to the value true.iter returned by the procedure, is the number of MCMC iterations for which the number of JAGS/Stan groups exactly coincides with the prespecified number of groups k.

Value

The function gives the MCMC output, the clustering solutions and the pivotal indexes. Here there is a complete list of outputs.

true.iter

The number of MCMC iterations for which the number of JAGS/Stan groups exactly coincides with the prespecified number of groups k.

Mu

An estimate of the groups' means.

groupPost

true.iter×Ntrue.iter \times N matrix with values from 1:k indicating the post-processed group allocation vector.

mcmc_mean

If y is a vector, a true.iter×ktrue.iter \times k matrix with the post-processed MCMC chains for the mean parameters; if y is a matrix, a true.iter×D×ktrue.iter \times D \times k array with the post-processed MCMC chains for the mean parameters.

mcmc_sd

If y is a vector, a true.iter×ktrue.iter \times k matrix with the post-processed MCMC chains for the sd parameters; if y is a matrix, a true.iter×Dtrue.iter \times D array with the post-processed MCMC chains for the sd parameters.

mcmc_weight

A true.iter×ktrue.iter \times k matrix with the post-processed MCMC chains for the weights parameters.

mcmc_mean_raw

If y is a vector, a (nMCburn)×k(nMC-burn) \times k matrix with the raw MCMC chains for the mean parameters as given by JAGS; if y is a matrix, a (nMCburn)×D×k(nMC-burn) \times D \times k array with the raw MCMC chains for the mean parameters as given by JAGS/Stan.

mcmc_sd_raw

If y is a vector, a (nMCburn)×k(nMC-burn) \times k matrix with the raw MCMC chains for the sd parameters as given by JAGS/Stan; if y is a matrix, a (nMCburn)×D(nMC-burn) \times D array with the raw MCMC chains for the sd parameters as given by JAGS/Stan.

mcmc_weight_raw

A (nMCburn)×k(nMC-burn) \times k matrix with the raw MCMC chains for the weights parameters as given by JAGS/Stan.

C

The N×NN \times N co-association matrix constructed from the MCMC sample.

grr

The vector of cluster membership returned by "diana" or "hclust".

pivots

The vector of indices of pivotal units identified by the selected pivotal criterion.

model

The JAGS/Stan model code. Apply the "cat" function for a nice visualization of the code.

stanfit

An object of S4 class stanfit for the fitted model (only if software="rstan").

Author(s)

Leonardo Egidi legidi@units.it

References

Egidi, L., Pappadà, R., Pauli, F. and Torelli, N. (2018). Relabelling in Bayesian Mixture Models by Pivotal Units. Statistics and Computing, 28(4), 957-969.

Examples


### Bivariate simulation

## Not run: 
N   <- 200
k   <- 4
D   <- 2
nMC <- 1000
M1  <- c(-.5,8)
M2  <- c(25.5,.1)
M3  <- c(49.5,8)
M4  <- c(63.0,.1)
Mu  <- rbind(M1,M2,M3,M4)
Sigma.p1 <- diag(D)
Sigma.p2 <- 20*diag(D)
W <- c(0.2,0.8)
sim <- piv_sim(N = N, k = k, Mu = Mu,
               Sigma.p1 = Sigma.p1,
               Sigma.p2 = Sigma.p2, W = W)

## rjags (default)
res <- piv_MCMC(y = sim$y, k =k, nMC = nMC)

## rstan
res_stan <- piv_MCMC(y = sim$y, k =k, nMC = nMC,
                     software ="rstan")

# changing priors
res2 <- piv_MCMC(y = sim$y,
                 priors = list (
                 mu_0=c(1,1),
                 S2 = matrix(c(0.002,0,0, 0.1),2,2, byrow=TRUE),
                 S3 = matrix(c(0.1,0,0,0.1), 2,2, byrow =TRUE)),
                 k = k, nMC = nMC)

## End(Not run)


### Fishery data (bayesmix package)

## Not run: 
library(bayesmix)
data(fish)
y <- fish[,1]
k <- 5
nMC <- 5000
res <- piv_MCMC(y = y, k = k, nMC = nMC)

# changing priors
res2   <- piv_MCMC(y = y,
                   priors = list(kind = "condconjugate",
                   parameter = "priorsRaftery",
                   hierarchical = "tau"),  k =k, nMC = nMC)

## End(Not run)

[Package pivmet version 0.6.0 Index]