PYregression {BNPmix} | R Documentation |
MCMC for Pitman-Yor mixture of Gaussian regressions
Description
The PYregression
function generates a posterior sample
for mixtures of linear regression models inspired by the ANOVA-DDP model
introduced in De Iorio et al. (2004). See details below for model specification.
Usage
PYregression(y, x, mcmc = list(), prior = list(), output = list())
Arguments
y |
a vector of observations, univariate dependent variable; |
x |
a matrix of observations, multivariate independent variable; |
mcmc |
a list of MCMC arguments:
|
prior |
a list giving the prior information. The list includes
|
output |
list of posterior summaries:
|
Details
This function fits a Pitman-Yor process mixture of Gaussian linear regression models, i.e
\tilde f(y) = \int \phi(y; x^T \beta, \sigma^2) \tilde p (d \beta, d \sigma^2)
where x
is a bivariate vector containing the dependent variable in x
and a value of 1
for the intercept term.
The mixing measure \tilde p
has a Pitman-Yor process prior with strength \vartheta
,
discount parameter \alpha
. The location model assume a base measures P_0
specified as
P_0(d \beta) = N(d \beta; m_0, S_0) .
while the location-scale model assume a base measures P_0
specified as
P_0(d \beta, d \sigma^2) = N(d \beta; m_0, S_0) \times IGa(d \sigma^2; a_0, b_0).
Optional hyperpriors complete the model specification:
m_0 \sim N(m_1, S_0 / k_1 ),\quad S_0 \sim IW(\nu_1, S_1),\quad b_0 \sim G(\tau_1, \zeta_1).
Posterior simulation methods
This generic function implements three types of MCMC algorithms for posterior simulation.
The default method is the importance conditional sampler 'ICS'
(Canale et al. 2019). Other options are
the marginal sampler 'MAR'
(algorithm 8 of Neal, 2000) and the slice sampler 'SLI'
(Kalli et al. 2011).
The importance conditional sampler performs an importance sampling step when updating the values of
individual parameters \theta
, which requires to sample m_imp
values from a suitable
proposal. Large values of m_imp
are known to improve the mixing of the posterior distribution
at the cost of increased running time (Canale et al. 2019). When updateing the individual parameter
\theta
, Algorithm 8 of Neal, 2000, requires to sample m_marginal
values from the base
measure. m_marginal
can be chosen arbitrarily. Two options are available for the slice sampler,
namely the dependent slice-efficient sampler (slice_type = 'DEP'
), which is set as default, and the
independent slice-efficient sampler (slice_type = 'INDEP'
) (Kalli et al. 2011). See Corradin et al. (to appear)
for more details.
Value
A BNPdens
class object containing the estimated density and
the cluster allocations for each iterations. The output contains also the data and
the grids. If out_param = TRUE
the output
contains also the kernel specific parameters for each iteration. If mcmc_dens = TRUE
, the
function returns also a realization from the posterior density for each iteration.
If mean_dens = TRUE
, the output contains just the mean of the densities sampled at each iteration.
The output retuns also the number of iterations,
the number of burn-in iterations, the computational time and the type of model.
References
Canale, A., Corradin, R., Nipoti, B. (2019), Importance conditional sampling for Bayesian nonparametric mixtures, arXiv preprint, arXiv:1906.08147
Corradin, R., Canale, A., Nipoti, B. (2021), BNPmix: An R Package for Bayesian Nonparametric Modeling via Pitman-Yor Mixtures, Journal of Statistical Software, doi:10.18637/jss.v100.i15
De Iorio, M., Mueller, P., Rosner, G.L., and MacEachern, S. (2004), An ANOVA Model for Dependent Random Measures, Journal of the American Statistical Association 99, 205-215, doi:10.1198/016214504000000205
Kalli, M., Griffin, J. E., and Walker, S. G. (2011), Slice sampling mixture models. Statistics and Computing 21, 93-105, doi:10.1007/s11222-009-9150-y
Neal, R. M. (2000), Markov Chain Sampling Methods for Dirichlet Process Mixture Models, Journal of Computational and Graphical Statistics 9, 249-265, doi:10.2307/1390653
Examples
x_toy <- c(rnorm(100, 3, 1), rnorm(100, 3, 1))
y_toy <- c(x_toy[1:100] * 2 + 1, x_toy[101:200] * 6 + 1) + rnorm(200, 0, 1)
grid_x <- c(0, 1, 2, 3, 4, 5)
grid_y <- seq(0, 35, length.out = 50)
est_model <- PYregression(y = y_toy, x = x_toy,
mcmc = list(niter = 200, nburn = 100),
output = list(grid_x = grid_x, grid_y = grid_y))
summary(est_model)
plot(est_model)