SSVSquantreg {MCMCpack} | R Documentation |
Stochastic search variable selection for quantile regression
Description
This function uses stochastic search to select promising regression models
at a fixed quantile \tau
. Indicator variables
\gamma
are used to represent whether a predictor is included in
the model or not. The user supplies the data and the prior distribution on
the model size. A list is returned containing the posterior sample of
\gamma
and the associated regression parameters
\beta
.
Usage
SSVSquantreg(
formula,
data = NULL,
tau = 0.5,
include = NULL,
burnin = 1000,
mcmc = 10000,
thin = 1,
verbose = 0,
seed = sample(1:1e+06, 1),
pi0a0 = 1,
pi0b0 = 1,
...
)
Arguments
formula |
Model formula. |
data |
Data frame. |
tau |
The quantile of interest. Must be between 0 and 1. The default value of 0.5 corresponds to median regression model selection. |
include |
The predictor(s) that should definitely appear in the model. Can be specified by name, or their position in the formula (taking into account the intercept). |
burnin |
The number of burn-in iterations for the sampler. |
mcmc |
The number of MCMC iterations after burnin. |
thin |
The thinning interval used in the simulation. The number of MCMC iterations must be divisible by this value. |
verbose |
A switch which determines whether or not the progress of the
sampler is printed to the screen. If |
seed |
The seed for the random number generator. If NA, the Mersenne
Twister generator is used with default seed 12345; if an integer is passed
it is used to seed the Mersenne twister. The default value for this argument
is a random integer between 1 and 1,000,000. This default value ensures
that if the function is used again with a different value of
|
pi0a0 , pi0b0 |
Hyperparameters of the beta prior on |
... |
Further arguments |
Details
SSVSquantreg
implements stochastic search variable selection
over a set of potential predictors to obtain promising models. The
models considered take the following form:
Q_{\tau}(y_i|x_{i\gamma}) = x_{i\gamma} ' \beta_{\gamma},
where Q_{\tau}(y_i|x_{i\gamma})
denotes the conditional
\tau
th quantile of y_i
given x_{i\gamma}
,
x_{i\gamma}
denotes x_i
with those predictors
x_{ij}
for which \gamma_j=0
removed and
\beta_{\gamma}
denotes the model specific regression
parameters.
The likelihood is formed based on the assumption of independent
asymmetric Laplace distributions on the y_i
with skewness
parameter \tau
and location parameters x_{i\gamma} '
\beta_{\gamma}
. This assumption ensures that the likelihood
function is maximised by the \tau
th conditional quantile of
the response variable.
The prior on each \beta_j
is
(1-\gamma_j)\delta_0+\gamma_j\mbox{Cauchy}(0,1),
where \delta_0
denotes a degenerate distribution with all
mass at 0. A standard Cauchy distribution is chosen conditional on
\gamma_j=1
. This allows for a wider range of nonzero
values of \beta_j
than a standard Normal distribution,
improving the robustness of the method. Each of the indicator variables
\gamma_j
is independently assigned a Bernoulli prior, with
prior probability of inclusion \pi_0
. This in turn is assigned
a beta distribution, resulting in a beta-binomial prior on the model size.
The user can supply the hyperparameters for the beta distribution. Starting
values are randomly generated from the prior distribution.
It is recommended to standardise any non-binary predictors in order to
compare these predictors on the same scale. This can be achieved using the
scale
function.
If it is certain that a predictor should be included, all predictors specified are brought to the first positions for computational convenience. The regression parameters associated with these predictors are given independent improper priors. Users may notice a small speed advantage if they specify the predictors that they feel certain should appear in the model, particularly for large models with a large number of observations.
Value
A list containing:
gamma |
The posterior sample of |
beta |
The posterior sample of the associated regression parameters
|
Author(s)
Craig Reed
References
Craig Reed, David B. Dunson and Keming Yu. 2010. "Bayesian Variable Selection for Quantile Regression" Technical Report.
Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.2. http://scythe.lsa.umich.edu.
Keming Yu and Jin Zhang. 2005. "A Three Parameter Asymmetric Laplace Distribution and it's extensions." Communications in Statistics - Theory and Methods, 34, 1867-1879.
Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2006. “Output Analysis and Diagnostics for MCMC (CODA)”, R News. 6(1): 7-11. https://CRAN.R-project.org/doc/Rnews/Rnews_2006-1.pdf.
See Also
MCMCquantreg
,
summary.qrssvs
, plot.qrssvs
,
mptable
, topmodels
,
scale
, rq
Examples
## Not run:
set.seed(1)
epsilon<-rnorm(100)
set.seed(2)
x<-matrix(rnorm(1000),100,10)
y<-x[,1]+x[,10]+epsilon
qrssvs<-SSVSquantreg(y~x)
model.50pc<-SSVSquantreg(y~x)
model.90pc<-SSVSquantreg(y~x,tau=0.9)
summary(model.50pc) ## Intercept not in median probability model
summary(model.90pc) ## Intercept appears in median probability model
## End(Not run)