R: MCMC for GM-dependent Dirichlet process mixtures of Gaussians

DDPdensity {BNPmix}

R Documentation

MCMC for GM-dependent Dirichlet process mixtures of Gaussians

Description

The DDPdensity function generates posterior density samples for a univariate Griffiths-Milne dependent Dirichlet process mixture model with Gaussian kernel, for partially exchangeable data. The function implements the importance conditional sampler method.

Usage

DDPdensity(y, group, mcmc = list(), prior = list(), output = list())

Arguments

`y`	a vector or matrix giving the data based on which densities are to be estimated;
`group`	vector of length `length(y)` containing the group labels (integers) for the elements of `y`;
`mcmc`	list of MCMC arguments: `niter` (mandatory), number of iterations. `nburn` (mandatory), number of iterations to discard as burn-in. `nupd`, argument controlling the number of iterations to be displayed on screen: the function reports on standard output every time `nupd` new iterations have been carried out (default is `niter/10`). `print_message`, control option. If equal to `TRUE`, the status is printed to standard output every `nupd` iterations (default is `TRUE`). `m_imp`, number of generated values for the importance sampling step of the importance conditional sampler (default is 10). See `details`. `var_MH_step`, variance of the Gaussian proposal for the Metropolis-Hastings of the weights update (default is 0.25).
`prior`	a list giving the prior information, which contains: `strength`, the strength parameter, or total mass, of the marginal Dirichlet processes (default 1); `m0`, mean of the normal base measure on the location parameter (default is the sample mean of the data); `k0`, scale factor appearing in the normal base measure on the location parameter (default 1); `a0`, shape parameter of the inverse gamma base measure on the scale parameter (default 2); `b0`, scale parameter of the inverse gamma base measure on the scale parameter (default is the sample variance of the data); `wei`, parameter controlling the strength of dependence across Dirichlet processes (default 1/2).
`output`	a list of arguments for generating posterior output. It contains: `grid`, a grid of points at which to evaluate the estimated posterior mean densities (common for all the groups). `out_type`, if `out_type = "FULL"`, return the estimated partitions and the realizations of the posterior density for each iterations. If `out_type = "MEAN"`, return the estimated partitions and the mean of the densities sampled at each iterations. If `out_type = "CLUST"`, return the estimated partitions. Default `out_type = "FULL"`.

Details

This function fits a Griffiths-Milne dependent Dirichlet process (GM-DDP) mixture for density estimation for partially exchangeable data (Lijoi et al., 2014). For each observation the group variable allows the observations to be gathered into L=length(unique(group)) distinct groups. The model assumes exchangeability within each group, with observations in the lth group marginally modelled by a location-scale Dirichlet process mixtures, i.e.

\tilde f_l(y) = \int \phi(y; \mu, \sigma^2) \tilde p_l (d \mu, d \sigma^2)

where each \tilde p_l is a Dirichlet process with total mass strength and base measure P_0. The vector \tilde p = (\tilde p_1,\ldots,\tilde p_L) is assumed to be jointly distributed as a vector of GM-DDP(strength, wei; P_0), where strength and P_0 are the total mass parameter and the base measure of each \tilde p_l, and wei controls the dependence across the components of \tilde p. Admissible values for wei are in (0,1), with the two extremes of the range corresponding to full exchangeability (wei\rightarrow 0) and independence across groups (wei\rightarrow 1).

P_0 is a normal-inverse gamma base measure, i.e.

P_0(d\mu,d\sigma^2) = N(d \mu; m_0, \sigma^2 / k_0) \times IGa(d \sigma^2; a_0, b_0).

Posterior sampling is obtained by implementing the importance conditional sampler (Canale et al., 2019). See Corradin et al. (to appear) for more details.

Value

A BNPdensity class object containing the estimated densities for each iteration, the allocations for each iteration; the grid used to evaluate the densities (for each group); the densities sampled from the posterior distribution (for each group); the groups; the weights of the processes. The function returns also informations regarding the estimation: the number of iterations, the number of burn-in iterations and the execution time.

References

Lijoi, A., Nipoti, B., and Pruenster, I. (2014). Bayesian inference with dependent normalized completely random measures. Bernoulli 20, 1260–1291, doi:10.3150/13-BEJ521

Canale, A., Corradin, R., & Nipoti, B. (2019). Importance conditional sampling for Bayesian nonparametric mixtures. arXiv preprint arXiv:1906.08147

Corradin, R., Canale, A., Nipoti, B. (2021), BNPmix: An R Package for Bayesian Nonparametric Modeling via Pitman-Yor Mixtures, Journal of Statistical Software, doi:10.18637/jss.v100.i15

Examples

data_toy <- c(rnorm(50, -4, 1), rnorm(100, 0, 1), rnorm(50, 4, 1))
group_toy <- c(rep(1,100), rep(2,100))
grid <- seq(-7, 7, length.out = 50)
est_model <- DDPdensity(y = data_toy, group = group_toy,
mcmc = list(niter = 200, nburn = 100, var_MH_step = 0.25),
output = list(grid = grid))
summary(est_model)
plot(est_model)

[Package BNPmix version 1.0.2 Index]