R: Quantile Regression

qreg.spike {BoomSpikeSlab}

R Documentation

Quantile Regression

Description

MCMC algorithm for quasi-Bayesian quantile models with a 'spike-and-slab' prior that places some amount of posterior probability at zero for a subset of the regression coefficients.

Usage

qreg.spike(formula,
               quantile,
               niter,
               ping = niter / 10,
               nthreads = 0,
               data,
               subset,
               prior = NULL,
               na.action = options("na.action"),
               contrasts = NULL,
               drop.unused.levels = TRUE,
               initial.value = NULL,
               seed = NULL,
               ...)

Arguments

`formula`	Formula for the maximal model (with all variables included).
`quantile`	A scalar value between 0 and 1 indicating the quantile of the conditional distribution being modeled.
`niter`	The number of MCMC iterations to run. Be sure to include enough so you can throw away a burn-in set.
`ping`	If positive, then print a status update to the console every `ping` MCMC iterations.
`nthreads`	The number of CPU-threads to use for data augmentation. There is some small overhead to stopping and starting threads. For small data sets, thread overhead will make it faster to run single threaded. For larger data sets multi-threading can speed things up substantially. This is all machine dependent, so please experiment.
`data`	An optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `qreg.spike` is called.
`subset`	An optional vector specifying a subset of observations to be used in the fitting process.
`prior`	An optional list such as that returned from `SpikeSlabPrior`. If missing, `SpikeSlabPrior` will be called using the extra arguments passed via ....
`na.action`	A function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset. The `factory-fresh` default is `na.omit`. Another possible value is `NULL`, no action. Value `na.exclude` can be useful.
`contrasts`	An optional list. See the `contrasts.arg` of `model.matrix.default`.
`drop.unused.levels`	A logical value indicating whether factor levels that are unobserved should be dropped from the model.
`initial.value`	Initial value for the MCMC algorithm. Can either be a numeric vector, a `glm` object (from which the coefficients will be used), or a `qreg.spike` object. If a `qreg.spike` object is supplied, it is assumed to be from a previous MCMC run for which `niter` additional draws are desired. If a `glm` object is supplied then its coefficients will be used as the initial values for the simulation.
`seed`	Seed to use for the C++ random number generator. It should be `NULL` or an int. If `NULL` the seed value will be taken from the global `.Random.seed` object.
`...`	Extra arguments to be passed to `SpikeSlabPrior`.

Details

Just like ordinary regression models the mean of a distribution as a linear function of X, quantile regression models a specific quantile (e.g. the 90th percentile) as a function of X.

Median regression is a special case of quantile regression. Median regression is sometimes cast in terms of minimizing |y - X * beta|, because the median is the optimal action under L1 loss. Similarly, selecting quantile tau is optimal under the asymmetric loss function

\rho_\tau(u) = \tau u I(u > 0) + (1-\tau) u I(u < 0)

Thus quantile regression (for a specific quantile tau) minimizes

Q(\beta) = \sum_i \rho_\tau( y_i - \beta^Tx_i)

Bayesian quantile regression treats

\exp(-2Q(\beta))

as a likelihood function to which a prior distribution p(\beta) is applied. For posterior sampling, a data augmentation scheme is used where each observation is associated with a latent variable \lambda_i, which has a marginal distribution of

Exp(2 \tau(1-\tau)).

The conditional distribution given the residual r = y - x \beta is

\frac{1}{\lambda} | r \sim InvGaus( 1 / |r|, 1.0)

The conditional distribution of beta given complete data (lambda and y) is a weighted least squares regression, where observation i has precision \lambda_i and where observation i is offset by 2(\tau - 1)\lambda_i.

Value

Returns an object of class qreg.spike, which inherits from lm.spike. The returned object is a list with the following elements

`beta`	A `niter` by `ncol(x)` matrix of regression coefficients, many of which may be zero. Each row corresponds to an MCMC iteration.
`prior`	The prior used to fit the model. If a `prior` was supplied as an argument it will be returned. Otherwise this will be the automatically generated prior based on the other function arguments.

Author(s)

Steven L. Scott

References

Parzen and Polson (2011, unpublished)

Examples

  n <- 50
  x <- rnorm(n)
  y <- rnorm(n, 4 * x)
  model <- qreg.spike(y ~ x,
                      quantile = .8,
                      niter = 1000,
                      expected.model.size = 100)

  ## Should get a slope near 4 and an intercept near qnorm(.8).
  PlotManyTs(model$beta[-(1:100),],
             same.scale = TRUE,
             truth = c(qnorm(.8), 4))

[Package BoomSpikeSlab version 1.2.6 Index]