densEstBayes {densEstBayes} | R Documentation |
Density estimation via Bayesian inference engines
Description
Construction of a high quality density estimate from a random sample is a fundamental problem in Statistics. This function delivers solutions to this problem by calling upon Bayesian inference engine algorithms with roots in the Machine Learning literature. Both stochastic and faster deterministic algorithms are provided. The stochastic options are Hamiltonian Monte Carlo and the no U-turn sampler (through the Stan inference engine) and slice sampling. The deterministic option is semiparametric mean field variational Bayes. The last of these options provides a very quick Bayesian density estimate, whereas the other options take longer to compute but, according to extensive simulation testing, are highly accurate. The essence of the approach is conversion of the density estimation problem to a Bayesian Poisson mixed model problem via binning on a fine grid. An attractive by-product of the binning approach is the ability to handle very large sample sizes. The number of bins, rather than the sample size, is the limiting factor.
Usage
densEstBayes(x,method,verbose,control)
Arguments
x |
Vector containing a continuous univariate data set. |
method |
Character string for specifying the method to be used: |
verbose |
Boolean variable for specifying whether or not progress messages are printed to the console. The default is TRUE. |
control |
Function for controlling Markov chain Monte Carlo sample sizes and other specifications. |
Details
The crux of the Bayesian density estimation approach is to bin the input data on a fine equally-spaced grid and treat the bin counts as response data in a Bayesian Poisson mixed model-based penalized splines model. Section 8 of Eilers and Marx (1996) provides details on couching the density estimation problem as a Poisson penalized splines problem.
Fitting and inference for the resultant Bayesian Poisson mixed model is carried out using one of four possible approaches as specified by the method
argument; one of which is deterministic ("SMFVB") and three of which are stochastic ("HMC", "NUTS" and "slice").
The settings method
= "HMC" and method
= "NUTS" correspond, respectively, to the Markov chain Monte Carlo approaches Hamiltonian Monte Carlo (e.g. Neal, 2011) and the no-U-turn sampler (Hoffman and Gelman, 2014) and fitting and inference is performed using the Stan Bayesian inference engine via the rstan
package (Stan Development Team, 2017).
The setting method
= "slice" corresponds to slice sampling (e.g. Neal, 2003).
The setting method
= "SMFVB" uses a semiparametric mean field variational Bayes approach as summarised in Algorithm 1 of Luts and Wand (2015). Comparatively speaking, this method leads to a very quick density estimate but at some accuracy and convergence reliability costs.
Value
An object of class densEstBayes
, which is a list with the following components:
method |
the value of |
range.x |
a numerical vector with 2 entries such that range.x[1] is the lower limit and range.x[2] is the upper limit of the interval over which the density estimate was computed. |
intKnots |
a numerical vector containing the locations of the interior knots used in the cubic O'Sullivan spline basis |
determFitObj |
if |
stochaFitObj |
if |
xname |
A character string that matches the |
sampHexTran |
Sample hexiles of the transformed input data to be used for possible checking Markov chains Monte Carlo samples. |
Author(s)
Matt P. Wand matt.wand@uts.edu.au
References
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties (with discussion). Statistical Science, 11, 89-121.
Hoffman, M.D. and Gelman, A. (2014). The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15, 1593-1623.
Minka, T. (2005). Divergence measures and message passing. Microsoft Research Technical Report Series, MSR-TR-2005-173, 1-17.
Luts, J. and Wand, M.P. (2015). Variational inference for count response semiparametric regression. Bayesian Analysis, 10, 991-1023.
Neal, R. (2003). Slice sampling (with discussion). The Annals of Statistics, 31, 705-767.
Neal, R. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, eds. S. Brooks, A. Gelman, G.L. Jones and X.-L. Meng. Boca Raton, Florida: Chapman & Hall/CRC Press.
Stan Development Team. (2020). Stan Modeling Language User's Guide and Reference Manual, https://mc-stan.org.
Wainwright, M.J. and Jordan, M.I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1, 1-305.
Wand, M.P. and Yu, J.C.F. (2020). Density estimation via Bayesian inference engines. Submitted.
Examples
library(densEstBayes) ; set.seed(1)
x <- rMarronWand(1000,8)
dest <- densEstBayes(x,method = "SMFVB")
plot(dest) ; rug(x,col = "dodgerblue",quiet = TRUE)
xg <- seq(-3,3,length = 1001)
trueDensg <- dMarronWand(xg,8)
lines(xg,trueDensg,col = "indianred3")
if (require("rstan"))
{
dest <- densEstBayes(x,method = "NUTS")
plot(dest) ; rug(x,col = "dodgerblue")
xg <- seq(-3,3,length = 1001)
trueDensg <- dMarronWand(xg,8)
lines(xg,trueDensg,col = "indianred3")
}