R: Gibbs sampling for density estimation for msBP model

msBP.Gibbs {msBP}

R Documentation

Gibbs sampling for density estimation for msBP model

Description

Gibbs sampling for Markov Chain Motecarlo sampling from the posterior distribution of an msBP model.

Usage

msBP.Gibbs(x, a, b, g0 = "normal", g0par=c(0,1), mcmc, 
	grid = list(n.points=40, low=0.001, upp=0.999), state=NULL, hyper, 
	printing=0, maxScale=5, ...)

Arguments

`x`	the observed sample
`a`	scalar a parameter
`b`	scalar b parameter
`g0`	prior guess for the density of `x`. Currently only "normal", "unif", "gamma", and "empirical" are supported. From version 1.1 random paramters are also allowed (only with `g0="normal"`).
`g0par`	additional scalar parameters for `g0`. If "normal" corresponds to mean and standard deviation, if "uniform" to upper and lower bounds, if "gamma" to shape and rate parameters. If "empirical" this value is not used. From version 1.1 random paramters are also allowed (only with `g0="normal"`).
`mcmc`	a list giving the MCMC parameters. It must include the following integers: `nb` giving the number of burn-in iterations, `nrep` giving the total number of iterations (including `nb`), and `ndisplay` giving the multiple of iterations to be displayed on screen while the `C++` routine is running (a message will be printed every `ndisplay` iterations).
`grid`	a list giving the parameters for plotting the posterior mean density over a finite grid. It must include the following values: `low` and `upp` giving the lower and upper bound respectively of the grid and `n.points`, an integer giving the number of points of the grid
`state`	a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis or if we want to start the MCMC algorithm from some particular value of the parameters.
`hyper`	a list containing the values of the hyperparameters for `a` and `b` or for the parameters of the prior guess (only if `g0="normal"`) . It must contains `hyperprior`, a list of three logical values determining if hyperpriors for `a`, `b` and `g0` are used (TRUE) or if `a`, `b`, or `g0` are fixed (FALSE), and `hyperpar` a list containing the hyperparameters for the hyperprior distributions: `beta`, `gamma`, `delta`, `lambda`, `mu0`, `kappa0`, `alpha0`, and `beta0`. See details. `gridB` is a grid of values for which the prior (and posterior) for `b` is evaluated with a Griddy Gibbs approach (Ritter and Tanner, 1992). See details.
`printing`	Vector of integers if the internal C++ function need to print what is doing
`maxScale`	maximum scale of the binary trees.
`...`	additional arguments.

Details

Before calling the proper C++ subrouting the function center the sample on an initial guess for the density of the data. If g0 = 'empirical' the data are transformed so that the expctation of the msBP prior is centered on the kernel density estimate of x.

The algorithm consists of two primary steps: (i) allocate each observation to a multiscale cluster, conditionally on the values of the weights (see also msBP.postCluster); (ii) update the weights, conditionally on the cluster allocations. All the procedure is written in C++ and additional R scripts are used to pre- and post-process the data and the output.

If hyper$hyperpriors$a or hyper$hyperpriors$b is true, additional hyperpriors for a and b are assumed. Specifically the algorithm implements a \sim Ga(\beta,\gamma) and b \sim Ga(\delta, \lambda). For the former parameter the full conditional posterior distribution is available in closed form, i.e.

a | - \sim Ga\left(\beta + 2^{s'+1} - 1, \gamma - \sum_{s=0}^{s'} \sum_{h=1}^{2^s} \log(1-S_{s,h}) \right),

while for the latter its full conditional posterior is proportional to

\frac{b^{\delta-1}}{B(b,b)^{2^{s+1}-1}} \exp \left\{b \left( \sum_{s=0}^{s'} \sum_{h=1}^{2^s} \log\{R_{s,h} (1 - R_{s,h} )\} - \lambda\right) \right\},

where s' is the maximum occupied scale and B(p, q) is the Beta function. To sample from the latter distribution, a griddy Gibbs approach over the grid defined by hyper$hyperpar$gridB is used. See Ritter and Tanner (1992). From Version 1.1, if hyper$hyperpriors$base=TRUE and g0="normal" additional hyperpriors for the parameter of the centering normal density are assumed. Specifically the model is

y = \Phi(x; \mu, \sigma^2)

(\mu, \sigma^2) \sim N(\mu; \mu_0, \kappa_0\sigma^2)\mbox{I-Ga}(\sigma^2; \alpha_0, \beta_0)

and an addtional step simulating the values of \mu and \sigma^2 from their conditional posterior distribution is added to the Gibbs sampler of Canale and Dunson (2016). Specifically, a Metropolis-Hastings step with proposal equal to the prior is implemented.

Value

A list containing

`density`	A list containing `postMeanDens`, the posterior mean density estimate evaluated over `xDens` and `postLowDens` and `postUppDens`, the lower and upper pointwise 95% credible bands,
`mcmc`	A list containing the MCMC chains: `dens` is a matrix (`nrep`-`nb`) times `n.grid`, `a` and `b` are the vectors with the MCMC chains for the two parameters (if `hyperprior` was TRUE), `scale` is a matix where each column is a MCMC chain of the total mass for each scale, `R` and `S`, are matrices where each column in the `tree2vec` form of the corresponding trees, `weights` is a matrix where each column is the `tree2vec` form of the corresponding tree of weights, `s` and `h` are matrices where each column is the MCMC chain for the node labels for a subject.
`postmean`	A list containing posterior means over the MCMC samples of `a`, `b`, and of all binary trees
`fit`	A list containing the LPML, mean and median of the log CPO.

References

Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.

Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.

Ritter C., Tanner M. (1992). "Facilitating the Gibbs Sampler: the Gibbs Stopper and the Griddy-Gibbs Sampler." Journal of the American Statistical Association, 87, 861-868.

Examples

## Not run: 
data(galaxy)
galaxy <- data.frame(galaxy)
speeds <- galaxy$speed/1000
set.seed(1)
#with fixed g0 and random a, b
fit.msbp.1 <- msBP.Gibbs(speeds, a = 10, b = 5, g0 = "empirical", 
	mcmc=list(nrep = 10000, nb = 5000, ndisplay = 1000), 
	hyper=list(hyperprior=list(a = TRUE, b = TRUE, g0 = FALSE), 
	hyperpar=list(beta=5,gamma = 1,delta = 1,lambda = 1)), 
	printing = 0, maxS = 7, grid = list(n.points = 150, low = 5, upp = 38))

#with random a, b and hyperparameters of g0
fit.msbp.2 <- msBP.Gibbs(speeds, a = 10, b=5, g0 = "normal", 
	mcmc=list(nrep = 10000, nb = 5000, ndisplay = 1000), 
	hyper=list(hyperprior = list(a = TRUE, b = TRUE, g0 = TRUE), 
  hyperpar = list(beta = 50, gamma = 5, delta = 10, lambda = 1,
	gridB = seq(0, 20, length = 30),
	mu0 = 21, kappa0 = 0.1, alpha0 = 1, beta0 = 20)), 
	printing = 0, maxS = 7, grid = list(n.points = 150, lo w= 5, upp = 38))	

hist(speeds, prob=TRUE,br=10, ylim=c(0,0.23), main="", col='grey')
points(fit.msbp.1$density$postMeanDens~fit.msbp.1$density$xDens, ty='l', lwd=2)
points(fit.msbp.1$density$postUppDens~fit.msbp.1$density$xDens, ty='l',lty=2, lwd=2)
points(fit.msbp.1$density$postLowDens~fit.msbp.1$density$xDens, ty='l',lty=2, lwd=2)

hist(speeds, prob=TRUE,br=10, ylim=c(0,0.23), main="", col='grey')
points(fit.msbp.2$density$postMeanDens~fit.msbp.2$density$xDens, ty='l', lwd=2)
points(fit.msbp.2$density$postUppDens~fit.msbp.2$density$xDens, ty='l',lty=2, lwd=2)
points(fit.msbp.2$density$postLowDens~fit.msbp.2$density$xDens, ty='l',lty=2, lwd=2)


## End(Not run)

[Package msBP version 1.4-1 Index]