R: MCMC for Pitman-Yor mixtures of Gaussians

PYdensity {BNPmix}

R Documentation

MCMC for Pitman-Yor mixtures of Gaussians

Description

The PYdensity function generates a posterior density sample for a selection of univariate and multivariate Pitman-Yor process mixture models with Gaussian kernels. See details below for the description of the different specifications of the implemented models.

Usage

PYdensity(y, mcmc = list(), prior = list(), output = list())

Arguments

`y`	a vector or matrix giving the data based on which the density is to be estimated;
`mcmc`	a list of MCMC arguments: `niter` (mandatory), number of iterations. `nburn` (mandatory), number of iterations to discard as burn-in. `method`, the MCMC sampling method to be used, options are `'ICS'`, `'MAR'` and `'SLI'` (default is `'ICS'`). See details. `model`, the type of model to be fitted (default is `'LS'`). See details. `nupd`, argument controlling the number of iterations to be displayed on screen: the function reports on standard output every time `nupd` new iterations have been carried out (default is `niter/10`). `print_message`, control option. If equal to `TRUE`, the status is printed to standard output every `nupd` iterations (default is `TRUE`). `m_imp`, number of generated values for the importance sampling step of `method = 'ICS'` (default is 10). See details. `slice_type`, when `method = 'SLI'` it specifies the type of slice sampler. Options are `'DEP'` for dependent slice-efficient, and `'INDEP'` for independent slice-efficient (default is `'DEP'`). See details. `hyper`, if equal to `TRUE`, hyperprior distributions on the base measure's parameters are added, as specified in `prior` and explained in `details` (default is `TRUE`).
`prior`	a list giving the prior information. The list includes `strength` and `discount`, the strength and discount parameters of the Pitman-Yor process (default are `strength = 1` and `discount = 0`, the latter leading to the Dirichlet process). The remaining parameters depend on the model choice. If `model = 'L'` (location mixture) and `y` is univariate: `m0` and `s20` are mean and variance of the base measure on the location parameter (default are sample mean and sample variance of the data); `a0` and `b0` are shape and scale parameters of the inverse gamma prior on the common scale parameter (default are 2 and the sample variance of the data). If `hyper = TRUE`, optional hyperpriors on the base measure's parameters are added: specifically, `m1` and `k1` are the mean parameter and the scale factor defining the normal hyperprior on `m0` (default are the sample mean of the data and 1), and `a1` and `b1` are shape and rate parameters of the inverse gamma hyperprior on `b0` (default are 2 and the sample variance of the data). See details. If `model = 'LS'` (location-scale mixture) and `y` is univariate: `m0` and `k0` are the mean parameter and the scale factor defining the normal base measure on the location parameter (default are the sample mean of the data and 1), and `a0` and `b0` are shape and scale parameters of the inverse gamma base measure on the scale parameter (default are 2 and the sample variance of the data). If `hyper = TRUE`, optional hyperpriors on the base measure's parameters are added: specifically, `m1` and `s21` are mean and variance parameters of the normal hyperprior on `m0` (default are the sample mean and the sample variance of the data); `tau1` and `zeta1` are shape and rate parameters of the gamma hyperprior on `k0` (default is 1 for both); `a1` and `b1` are shape and rate parameters of the gamma hyperprior on `b0` (default are the sample variance of the data and 1). See details. If `model = 'L'` (location mixture) and `y` is multivariate (`p`-variate): `m0` and `S20` are mean and covariance of the base measure on the location parameter (default are the sample mean and the sample covariance of the data); `Sigma0` and `n0` are the parameters of the inverse Whishart prior on the common scale matrix (default are the sample covariance of the data and `p`+2). If `hyper = TRUE`, optional hyperpriors on the base measure's parameters are added: specifically, `m1` and `k1` are the mean parameter and the scale factor defining the normal hyperprior on `m0` (default are the sample mean of the data and 1), and `lambda` and `Lambda1` are the parameters (degrees of freedom and scale) of the inverse Wishart prior on `S20` (default are `p`+2 and the sample covariance of the data). See details. If `model = 'LS'` (location-scale mixture) and `y` is multivariate (`p`-variate): `m0` and `k0` are the mean parameter and the scale factor defining the normal base measure on the location parameter (default are the sample mean of the data and 1), and `n0` and `Sigma0` are the parameters (degrees of freedom and scale) of the inverse Wishart base measure on the location parameter (default are `p`+2 and the sample covariance of the data). If `hyper = TRUE`, optional hyperpriors on the base measure's parameters are added: specifically, `m1` and `S1` are mean and covariance matrix parameters of the normal hyperprior on `m0` (default are the sample mean and the sample covariance of the data); `tau1` and `zeta1` are shape and rate parameters of the gamma hyperprior on `k0` (default is 1 for both); `n1` and `Sigma1` are the parameters (degrees of freedom and scale) of the Wishart prior for `Sigma0` (default are `p`+2 and the sample covariance of the data divided `p`+2). See details. If `model = 'DLS'` (diagonal location-scale mixture): `m0` and `k0` are the mean vector parameter and the vector of scale factors defining the normal base measure on the location parameter (default are the sample mean and a vector of ones), and `a0` and `b0` are vectors of shape and scale parameters defining the base measure on the scale parameters (default are a vector of twos and the diagonal of the sample covariance of the data). If `hyper = TRUE`, optional hyperpriors on the base measure's parameters are added: specifically, `m1` and `s21` are vectors of mean and variance parameters for the normal hyperpriors on the components of `m0` (default are the sample mean and the diagonal of the sample covariance of the data); `tau1` and `zeta1` are vectors of shape and rate parameters of the gamma hyperpriors on the components of `k0` (default is a vector of ones for both); `a1` and `b1` are vectors of shape and rate parameters of the gamma hyperpriors on the components of `b0` (default is the diagonal of the sample covariance of the data and a vector of ones). See details.
`output`	a list of arguments for generating posterior output. It contains: `grid`, a grid of points at which to evaluate the estimated posterior mean density; a data frame obtained with the `expand.grid` function. `out_param`, if equal to `TRUE`, the function returns the draws of the kernel's parameters for each MCMC iteration, default is `FALSE`. See `value` for details. `out_type`, if `out_type = "FULL"`, the function returns the visited partitions and the realizations of the posterior density for each iterations. If `out_type = "MEAN"`, the function returns the estimated partitions and the mean of the densities sampled at each iterations. If `out_type = "CLUST"`, the function returns the estimated partition. Default `"FULL"`.

Details

This generic function fits a Pitman-Yor process mixture model for density estimation and clustering. The general model is

\tilde f(y) = \int K(y; \theta) \tilde p (d \theta),

where K(y; \theta) is a kernel density with parameter \theta\in\Theta. Univariate and multivariate Gaussian kernels are implemented with different specifications for the parametric space \Theta, as described below. The mixing measure \tilde p has a Pitman-Yor process prior with strength parameter \vartheta, discount parameter \alpha, and base measure P_0 admitting the specifications presented below. For posterior sampling, three MCMC approaches are implemented. See details below.

Univariate data

For univariate y the function implements both a location and location-scale mixture model. The former assumes

\tilde f(y) = \int \phi(y; \mu, \sigma^2) \tilde p (d \mu) \pi(\sigma^2),

where \phi(y; \mu, \sigma^2) is a univariate Gaussian kernel function with mean \mu and variance \sigma^2, and \pi(\sigma^2) is an inverse gamma prior. The base measure is specified as

P_0(d \mu) = N(d \mu; m_0, \sigma^2_0),

and \sigma^2 \sim IGa(a_0, b_0). Optional hyperpriors for the base measure's parameters are

(m_0,\sigma^2_0) \sim N(m_1, \sigma^2_0 / k_1) \times IGa(a_1, b_1).

The location-scale mixture model, instead, assumes

\tilde f(y) = \int \phi(y; \mu, \sigma^2) \tilde p (d \mu, d \sigma^2)

with normal-inverse gamma base measure

P_0 (d \mu, d \sigma^2) = N(d \mu; m_0, \sigma^2 / k_0) \times IGa(d \sigma^2; a_0, b_0).

and (optional) hyperpriors

m_0 \sim N(m_1, \sigma_1^2 ),\quad k_0 \sim Ga(\tau_1, \zeta_1),\quad b_0 \sim Ga(a_1, b_1).

Multivariate data

For multivariate y (p-variate) the function implements a location mixture model (with full covariance matrix) and two different location-scale mixture models, with either full or diagonal covariance matrix. The location mixture model assumes

\tilde f(y) = \int \phi_p(y; \mu, \Sigma) \tilde p (d \mu) \pi(\Sigma)

where \phi_p(y; \mu, \Sigma) is a p-dimensional Gaussian kernel function with mean vector \mu and covariance matrix \Sigma. The prior on \Sigma is inverse Whishart with parameters \Sigma_0 and \nu_0, while the base measure is

P_0(d \mu) = N(d \mu; m_0, S_0),

with optional hyperpriors

m_0 \sim N(m_1, S_0 / k_1),\quad S_0 \sim IW(\lambda_1, \Lambda_1).

The location-scale mixture model assumes

\tilde f(x) = \int \phi_p(y; \mu, \Sigma) \tilde p (d \mu, d \Sigma).

Two possible structures for \Sigma are implemented, namely full and diagonal covariance. For the full covariance mixture model, the base measure is the normal-inverse Wishart

P_0 (d \mu, d \Sigma) = N(d \mu; m_0, \Sigma / k_0) \times IW(d \Sigma; \nu_0, \Sigma_0),

with optional hyperpriors

m_0 \sim N(m_1, S_1),\quad k_0 \sim Ga(\tau_1, \zeta_1),\quad b_0 \sim W(\nu_1, \Sigma_1).

The second location-scale mixture model assumes a diagonal covariance structure. This is equivalent to write the mixture model as a mixture of products of univariate normal kernels, i.e.

\tilde f(y) = \int \prod_{r=1}^p \phi(y_r; \mu_r, \sigma^2_r) \tilde p (d \mu_1,\ldots,d \mu_p, d \sigma_1^2,\ldots,d \sigma_p^2).

For this specification, the base measure is assumed defined as the product of p independent normal-inverse gamma distributions, that is

P_0 = \prod_{r=1}^p P_{0r}

where

P_{0r}(d \mu_r,d \sigma_r^2) = N(d \mu_r; m_{0r}, \sigma^2_r / k_{0r}) \times Ga(d \sigma^2_r; a_{0r}, b_{0r}).

Optional hyperpriors can be added, and, for each component, correspond to the set of hyperpriors considered for the univariate location-scale mixture model.

Posterior simulation methods

This generic function implements three types of MCMC algorithms for posterior simulation. The default method is the importance conditional sampler 'ICS' (Canale et al. 2019). Other options are the marginal sampler 'MAR' (Neal, 2000) and the slice sampler 'SLI' (Kalli et al. 2011). The importance conditional sampler performs an importance sampling step when updating the values of individual parameters \theta, which requires to sample m_imp values from a suitable proposal. Large values of m_imp are known to improve the mixing of the chain at the cost of increased running time (Canale et al. 2019). Two options are available for the slice sampler, namely the dependent slice-efficient sampler (slice_type = 'DEP'), which is set as default, and the independent slice-efficient sampler (slice_type = 'INDEP') (Kalli et al. 2011). See Corradin et al. (to appear) for more details.

Value

A BNPdens class object containing the estimated density and the cluster allocations for each iterations. If out_param = TRUE the output contains also the kernel specific parameters for each iteration. If mcmc_dens = TRUE the output contains also a realization from the posterior density for each iteration. IF mean_dens = TRUE the output contains just the mean of the realizations from the posterior density. The output contains also informations as the number of iterations, the number of burn-in iterations, the used computational time and the type of estimated model (univariate = TRUE or FALSE).

References

Canale, A., Corradin, R., Nipoti, B. (2019), Importance conditional sampling for Bayesian nonparametric mixtures, arXiv preprint, arXiv:1906.08147

Corradin, R., Canale, A., Nipoti, B. (2021), BNPmix: An R Package for Bayesian Nonparametric Modeling via Pitman-Yor Mixtures, Journal of Statistical Software, 100, doi:10.18637/jss.v100.i15

Kalli, M., Griffin, J. E., and Walker, S. G. (2011), Slice sampling mixture models. Statistics and Computing 21, 93-105, doi:10.1007/s11222-009-9150-y

Neal, R. M. (2000), Markov Chain Sampling Methods for Dirichlet Process Mixture Models, Journal of Computational and Graphical Statistics 9, 249-265, doi:10.2307/1390653

Examples

data_toy <- cbind(c(rnorm(100, -3, 1), rnorm(100, 3, 1)),
                  c(rnorm(100, -3, 1), rnorm(100, 3, 1)))
grid <- expand.grid(seq(-7, 7, length.out = 50),
                    seq(-7, 7, length.out = 50))
est_model <- PYdensity(y = data_toy, mcmc = list(niter = 200, nburn = 100),
output = list(grid = grid))
summary(est_model)
plot(est_model)

[Package BNPmix version 1.0.2 Index]