DDPdensity {BNPmix}R Documentation

MCMC for GM-dependent Dirichlet process mixtures of Gaussians

Description

The DDPdensity function generates posterior density samples for a univariate Griffiths-Milne dependent Dirichlet process mixture model with Gaussian kernel, for partially exchangeable data. The function implements the importance conditional sampler method.

Usage

DDPdensity(y, group, mcmc = list(), prior = list(), output = list())

Arguments

y

a vector or matrix giving the data based on which densities are to be estimated;

group

vector of length length(y) containing the group labels (integers) for the elements of y;

mcmc

list of MCMC arguments:

  • niter (mandatory), number of iterations.

  • nburn (mandatory), number of iterations to discard as burn-in.

  • nupd, argument controlling the number of iterations to be displayed on screen: the function reports on standard output every time nupd new iterations have been carried out (default is niter/10).

  • print_message, control option. If equal to TRUE, the status is printed to standard output every nupd iterations (default is TRUE).

  • m_imp, number of generated values for the importance sampling step of the importance conditional sampler (default is 10). See details.

  • var_MH_step, variance of the Gaussian proposal for the Metropolis-Hastings of the weights update (default is 0.25).

prior

a list giving the prior information, which contains:

  • strength, the strength parameter, or total mass, of the marginal Dirichlet processes (default 1);

  • m0, mean of the normal base measure on the location parameter (default is the sample mean of the data);

  • k0, scale factor appearing in the normal base measure on the location parameter (default 1);

  • a0, shape parameter of the inverse gamma base measure on the scale parameter (default 2);

  • b0, scale parameter of the inverse gamma base measure on the scale parameter (default is the sample variance of the data);

  • wei, parameter controlling the strength of dependence across Dirichlet processes (default 1/2).

output

a list of arguments for generating posterior output. It contains:

  • grid, a grid of points at which to evaluate the estimated posterior mean densities (common for all the groups).

  • out_type, if out_type = "FULL", return the estimated partitions and the realizations of the posterior density for each iterations. If out_type = "MEAN", return the estimated partitions and the mean of the densities sampled at each iterations. If out_type = "CLUST", return the estimated partitions. Default out_type = "FULL".

Details

This function fits a Griffiths-Milne dependent Dirichlet process (GM-DDP) mixture for density estimation for partially exchangeable data (Lijoi et al., 2014). For each observation the group variable allows the observations to be gathered into LL=length(unique(group)) distinct groups. The model assumes exchangeability within each group, with observations in the llth group marginally modelled by a location-scale Dirichlet process mixtures, i.e.

f~l(y)=ϕ(y;μ,σ2)p~l(dμ,dσ2)\tilde f_l(y) = \int \phi(y; \mu, \sigma^2) \tilde p_l (d \mu, d \sigma^2)

where each p~l\tilde p_l is a Dirichlet process with total mass strength and base measure P0P_0. The vector p~=(p~1,,p~L)\tilde p = (\tilde p_1,\ldots,\tilde p_L) is assumed to be jointly distributed as a vector of GM-DDP(strength, wei; P0P_0), where strength and P0P_0 are the total mass parameter and the base measure of each p~l\tilde p_l, and wei controls the dependence across the components of p~\tilde p. Admissible values for wei are in (0,1)(0,1), with the two extremes of the range corresponding to full exchangeability (wei0\rightarrow 0) and independence across groups (wei1\rightarrow 1).

P0P_0 is a normal-inverse gamma base measure, i.e.

P0(dμ,dσ2)=N(dμ;m0,σ2/k0)×IGa(dσ2;a0,b0).P_0(d\mu,d\sigma^2) = N(d \mu; m_0, \sigma^2 / k_0) \times IGa(d \sigma^2; a_0, b_0).

Posterior sampling is obtained by implementing the importance conditional sampler (Canale et al., 2019). See Corradin et al. (to appear) for more details.

Value

A BNPdensity class object containing the estimated densities for each iteration, the allocations for each iteration; the grid used to evaluate the densities (for each group); the densities sampled from the posterior distribution (for each group); the groups; the weights of the processes. The function returns also informations regarding the estimation: the number of iterations, the number of burn-in iterations and the execution time.

References

Lijoi, A., Nipoti, B., and Pruenster, I. (2014). Bayesian inference with dependent normalized completely random measures. Bernoulli 20, 1260–1291, doi:10.3150/13-BEJ521

Canale, A., Corradin, R., & Nipoti, B. (2019). Importance conditional sampling for Bayesian nonparametric mixtures. arXiv preprint arXiv:1906.08147

Corradin, R., Canale, A., Nipoti, B. (2021), BNPmix: An R Package for Bayesian Nonparametric Modeling via Pitman-Yor Mixtures, Journal of Statistical Software, doi:10.18637/jss.v100.i15

Examples

data_toy <- c(rnorm(50, -4, 1), rnorm(100, 0, 1), rnorm(50, 4, 1))
group_toy <- c(rep(1,100), rep(2,100))
grid <- seq(-7, 7, length.out = 50)
est_model <- DDPdensity(y = data_toy, group = group_toy,
mcmc = list(niter = 200, nburn = 100, var_MH_step = 0.25),
output = list(grid = grid))
summary(est_model)
plot(est_model)


[Package BNPmix version 1.0.2 Index]