R: Bayesian variable selection for the Poisson model

poissonBvs {pogit}

R Documentation

Bayesian variable selection for the Poisson model

Description

This function performs Bayesian variable selection for Poisson regression models via spike and slab priors. A cluster- (or observation-) specific random intercept can be included in the model to account for within-cluster dependence (or overdispersion) with variance selection of the random intercept. For posterior inference, a MCMC sampling scheme is used which relies on data augmentation and involves only Gibbs sampling steps.

Usage

poissonBvs(
  y,
  offset = NULL,
  X,
  model = list(),
  mcmc = list(),
  prior = list(),
  start = NULL,
  BVS = TRUE
)

Arguments

`y`	an integer vector of count data
`offset`	an (optional) offset term; should be `NULL` or an integer vector of length equal to the number of counts.
`X`	a design matrix (including an intercept term)
`model`	an (optional) list specifying the structure of the model (see details)
`mcmc`	an (optional) list of MCMC sampling options (see details)
`prior`	an (optional) list of prior settings and hyper-parameters controlling the priors (see details)
`start`	an (optional), numeric vector containing starting values for the regression effects (including an intercept term); defaults to `NULL` (i.e. a vector of zeros is used).
`BVS`	if `TRUE` (default), Bayesian variable selection is performed to identify regressors with a non-zero effect; otherwise, an unrestricted model is estimated (without variable selection).

Details

The method provides a Bayesian framework for variable selection in regression modelling of count data using mixture priors with a spike and a slab component to identify regressors with a non-zero effect. More specifically, a Dirac spike is used, i.e. a point mass at zero, and (by default), the slab component is specified as a scale mixture of normal distributions, resulting in a Student-t distribution with 2psi.nu degrees of freedom. In the more general random intercept model, variance selection of the random intercept is based on the non-centered parameterization of the model, where the signed standard deviation \theta_\beta of the random intercept term appears as a further regression effect in the model equation. For further details, see Wagner and Duller (2012).

The implementation of Bayesian variable selection further relies on the representation of the Poisson model as a Gaussian regression model in auxiliary variables. Data augmentation is based on the auxiliary mixture sampling algorithm of Fruehwirth-Schnatter et al. (2009), where the inter-arrival times of an assumed Poisson process are introduced as latent variables. The error distribution, a negative log-Gamma distribution, in the auxiliary model is approximated by a finite mixture of normal distributions where the mixture parameters of the matlab package bayesf, Version 2.0 of Fruehwirth-Schnatter (2007) are used. See Fruehwirth-Schnatter et al. (2009) for details.

For details concerning the sampling algorithm, see Dvorzak and Wagner (2016) and Wagner and Duller (2012).

Details for model specification (see arguments):

model:

deltafix: an indicator vector of length ncol(X)-1 specifying which regression effects are subject to selection (i.e., 0 = subject to selection, 1 = fix in the model); defaults to a vector of zeros.
gammafix: an indicator for variance selection of the random intercept term (i.e., 0 = with variance selection (default), 1 = no variance selection); only used if a random intercept is includued in the model (see ri).
ri: logical. If TRUE, a cluster- (or observation-) specific random intercept is included in the model; defaults to FALSE.
clusterID: a numeric vector of length equal to the number of observations containing the cluster ID c = 1,...,C for each observation (required if ri=TRUE). Note that seq_along(y) specifies an overdispersed Poisson model with observation-specific (normal) random intercept (see note).

prior:

slab: distribution of the slab component, i.e. "Student" (default) or "Normal".
psi.nu: hyper-parameter of the Student-t slab (used for a "Student" slab); defaults to 5.
m0: prior mean for the intercept parameter; defaults to 0.
M0: prior variance for the intercept parameter; defaults to 100.
aj0: a vector of prior means for the regression effects (which is encoded in a normal distribution, see note); defaults to vector of zeros.
V: variance of the slab; defaults to 5.
w: hyper-parameters of the Beta-prior for the mixture weight \omega; defaults to c(wa0=1, wb0=1), i.e. a uniform distribution.
pi: hyper-parameters of the Beta-prior for the mixture weight \pi; defaults to c(pa0=1, pb0=1), i.e. a uniform distribution.

mcmc:

M: number of MCMC iterations after the burn-in phase; defaults to 8000.
burnin: number of MCMC iterations discarded as burn-in; defaults to 2000.
thin: thinning parameter; defaults to 1.
startsel: number of MCMC iterations drawn from the unrestricted model (e.g., burnin/2); defaults to 1000.
verbose: MCMC progress report in each verbose-th iteration step; defaults to 500. If verbose=0, no output is generated.
msave: returns additional output with variable selection details (i.e. posterior samples for \omega, \delta, \pi, \gamma); defaults to FALSE.

Value

The function returns an object of class "pogit" with methods print.pogit, summary.pogit and plot.pogit.

The returned object is a list containing the following elements:

`samplesP`	a named list containing the samples from the posterior distribution of the parameters in the Poisson model (see also `msave`): `beta, thetaBeta` regression coefficients `\beta` and `\theta_\beta` `pdeltaBeta` P(`\delta_\beta`=1) `psiBeta` scale parameter `\psi_\beta` of the slab component `pgammaBeta` P(`\gamma_\beta`=1) `bi` cluster- (or observation-) specific random intercept
`data`	a list containing the data `y`, `offset` and `X`
`model.pois`	a list containing details on the model specification, see details for `model`
`mcmc`	see details for `mcmc`
`prior.pois`	see details for `prior`
`dur`	a list containing the total runtime (`total`) and the runtime after burn-in (`durM`), in seconds
`BVS`	see arguments
`start`	a list containing starting values, see arguments
`family`	"poisson"
`call`	function call

Note

If prior information on the regression parameters is available, this information is encoded in a normal distribution instead of the spike and slab prior (consequently, BVS is set to FALSE).

This function can also be used to accommodate overdispersion in count data by specifying an observation-specific random intercept (see details for model). The resulting model is an alternative to the negative binomial model, see negbinBvs. Variance selection of the random intercept may be useful to examine whether overdispersion is present in the data.

Author(s)

Michaela Dvorzak <m.dvorzak@gmx.at>, Helga Wagner

References

Dvorzak, M. and Wagner, H. (2016). Sparse Bayesian modelling of underreported count data. Statistical Modelling, 16(1), 24 - 46, doi:10.1177/1471082x15588398.

Fruehwirth-Schnatter, S. (2007). Matlab package bayesf 2.0 on Finite Mixture and Markov Switching Models, Springer. https://statmath.wu.ac.at/~fruehwirth/monographie/.

Fruehwirth-Schnatter, S., Fruehwirth, R., Held, L. and Rue, H. (2009). Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Statistics and Computing, 19, 479 - 492.

Wagner, H. and Duller, C. (2012). Bayesian model selection for logistic regression models with random intercept. Computational Statistics and Data Analysis, 56, 1256 - 1274.

Examples

## Not run: 
## Examples below should take about 1-2 minutes.

## ------ (use simul_pois1) ------
# load simulated data set 'simul_pois1'
data(simul_pois1)
y <- simul_pois1$y
X <- as.matrix(simul_pois1[, -1])

# Bayesian variable selection for simulated data set
m1 <- poissonBvs(y = y, X = X)

# print, summarize and plot results
print(m1)
summary(m1)
plot(m1, maxPlots = 4)
plot(m1, burnin = FALSE, thin = FALSE, maxPlots = 4)
plot(m1, type = "acf")

# MCMC sampling without BVS with specific MCMC and prior settings
m2 <- poissonBvs(y = y, X = X, prior = list(slab = "Normal"), 
                 mcmc = list(M = 6000, thin = 10), BVS = FALSE)
print(m2)
summary(m2, IAT = TRUE)
plot(m2)
# show traceplots disregarding thinning
plot(m2, thin = FALSE)

# specification of an overdispersed Poisson model with observation-specific 
# (normal) random intercept
cID <- seq_along(y)
m3  <- poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID))

# print, summarize and plot results
print(m3)
summary(m3) 
# note that variance selection of the random intercept indicates that 
# overdispersion is not present in the data
plot(m3, burnin = FALSE, thin = FALSE)

## ------ (use simul_pois2) ------
# load simulated data set 'simul_pois2'
data(simul_pois2)
y <- simul_pois2$y
X <- as.matrix(simul_pois2[, -c(1,2)])
cID <- simul_pois2$cID

# BVS for a Poisson model with cluster-specific random intercept
m4 <- poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID),
                 mcmc = list(M = 4000, burnin = 2000))
print(m4)
summary(m4)
plot(m4)
                               
# similar to m4, but without variance selection of the random intercept term
model <- list(gammafix = 1, ri = 1, clusterID = cID)
m5 <- poissonBvs(y = y, X = X, model = model, mcmc = list(M = 4000, thin = 5))
print(m5)       
summary(m5)          
plot(m5)

# MCMC sampling without BVS for clustered observations
m6 <- poissonBvs(y = y, X = X, model = list(ri = 1, clusterID = cID), 
                 BVS = FALSE)
print(m6)         
summary(m6)        
plot(m6, maxPlots = 4)

## End(Not run)

[Package pogit version 1.3.0 Index]