qreg.spike {BoomSpikeSlab}R Documentation

Quantile Regression

Description

MCMC algorithm for quasi-Bayesian quantile models with a 'spike-and-slab' prior that places some amount of posterior probability at zero for a subset of the regression coefficients.

Usage

qreg.spike(formula,
               quantile,
               niter,
               ping = niter / 10,
               nthreads = 0,
               data,
               subset,
               prior = NULL,
               na.action = options("na.action"),
               contrasts = NULL,
               drop.unused.levels = TRUE,
               initial.value = NULL,
               seed = NULL,
               ...)

Arguments

formula

Formula for the maximal model (with all variables included).

quantile

A scalar value between 0 and 1 indicating the quantile of the conditional distribution being modeled.

niter

The number of MCMC iterations to run. Be sure to include enough so you can throw away a burn-in set.

ping

If positive, then print a status update to the console every ping MCMC iterations.

nthreads

The number of CPU-threads to use for data augmentation. There is some small overhead to stopping and starting threads. For small data sets, thread overhead will make it faster to run single threaded. For larger data sets multi-threading can speed things up substantially. This is all machine dependent, so please experiment.

data

An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which qreg.spike is called.

subset

An optional vector specifying a subset of observations to be used in the fitting process.

prior

An optional list such as that returned from SpikeSlabPrior. If missing, SpikeSlabPrior will be called using the extra arguments passed via ....

na.action

A function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The factory-fresh default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

contrasts

An optional list. See the contrasts.arg of model.matrix.default.

drop.unused.levels

A logical value indicating whether factor levels that are unobserved should be dropped from the model.

initial.value

Initial value for the MCMC algorithm. Can either be a numeric vector, a glm object (from which the coefficients will be used), or a qreg.spike object. If a qreg.spike object is supplied, it is assumed to be from a previous MCMC run for which niter additional draws are desired. If a glm object is supplied then its coefficients will be used as the initial values for the simulation.

seed

Seed to use for the C++ random number generator. It should be NULL or an int. If NULL the seed value will be taken from the global .Random.seed object.

...

Extra arguments to be passed to SpikeSlabPrior.

Details

Just like ordinary regression models the mean of a distribution as a linear function of X, quantile regression models a specific quantile (e.g. the 90th percentile) as a function of X.

Median regression is a special case of quantile regression. Median regression is sometimes cast in terms of minimizing |y - X * beta|, because the median is the optimal action under L1 loss. Similarly, selecting quantile tau is optimal under the asymmetric loss function

\rho_\tau(u) = \tau u I(u > 0) + (1-\tau) u I(u < 0)

Thus quantile regression (for a specific quantile tau) minimizes

Q(\beta) = \sum_i \rho_\tau( y_i - \beta^Tx_i)

Bayesian quantile regression treats

\exp(-2Q(\beta))

as a likelihood function to which a prior distribution p(\beta) is applied. For posterior sampling, a data augmentation scheme is used where each observation is associated with a latent variable \lambda_i, which has a marginal distribution of

Exp(2 \tau(1-\tau)).

The conditional distribution given the residual r = y - x \beta is

\frac{1}{\lambda} | r \sim InvGaus( 1 / |r|, 1.0)

The conditional distribution of beta given complete data (lambda and y) is a weighted least squares regression, where observation i has precision \lambda_i and where observation i is offset by 2(\tau - 1)\lambda_i.

Value

Returns an object of class qreg.spike, which inherits from lm.spike. The returned object is a list with the following elements

beta

A niter by ncol(x) matrix of regression coefficients, many of which may be zero. Each row corresponds to an MCMC iteration.

prior

The prior used to fit the model. If a prior was supplied as an argument it will be returned. Otherwise this will be the automatically generated prior based on the other function arguments.

Author(s)

Steven L. Scott

References

Parzen and Polson (2011, unpublished)

See Also

lm.spike SpikeSlabPrior, plot.qreg.spike, predict.qreg.spike.

Examples

  n <- 50
  x <- rnorm(n)
  y <- rnorm(n, 4 * x)
  model <- qreg.spike(y ~ x,
                      quantile = .8,
                      niter = 1000,
                      expected.model.size = 100)

  ## Should get a slope near 4 and an intercept near qnorm(.8).
  PlotManyTs(model$beta[-(1:100),],
             same.scale = TRUE,
             truth = c(qnorm(.8), 4))


[Package BoomSpikeSlab version 1.2.6 Index]