geomc.vs {geommc} | R Documentation |
Markov chain Monte Carlo for Bayesian variable selection using a geometric MH algorithm.
Description
geomc.vs uses a geometric approach to MCMC for performing Bayesian variable selection. It produces MCMC samples from the posterior density of a Bayesian hierarchical feature selection model.
Usage
geomc.vs(
X,
y,
initial = NULL,
n.iter = 50,
burnin = 1,
eps = 0.5,
symm = TRUE,
move.prob = c(0.4, 0.4, 0.2),
model.threshold = 0.5,
lam = nrow(X)/ncol(X)^2,
w = sqrt(nrow(X))/ncol(X)
)
Arguments
X |
The |
y |
The response vector of length |
initial |
is the initial model (the set of active variables). Default: Null model. |
n.iter |
is the no. of samples needed. Default: 50. |
burnin |
is the value of burnin used to compute the median probability model. Default: 1. |
eps |
is the value for epsilon perturbation. Default: 0.5. |
symm |
indicates if the base density is of symmetric RW-MH. Default: True. |
move.prob |
is the vector of ('addition', 'deletion', 'swap') move probabilities. Default: (0.4,0.4,0.2). move.prob is used only when symm is set to False. |
model.threshold |
The threshold probability to select the covariates for the median model (median.model) and the weighted average model (wam). A covariate will be included in median.model (wam) if its marginal inclusion probability (weighted marginal inclusion probability) is greater than the threshold. Default: 0.5. |
lam |
The slab precision parameter. Default: |
w |
The prior inclusion probability of each variable. Default: |
Details
geomc.vs provides MCMC samples using the geometric MH algorithm of Roy (2024) for variable selection based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space as follows:
where is the
submatrix of
consisting of
those columns of
for which
and similarly,
is the
subvector of
corresponding to
.
geomc.vs provides MCMC samples from the posterior pmf of the models
, which is available
up to a normalizing constant.
geomc.vs also returns the marginal inclusion probabilities (mip)
computed by the Monte Carlo average as well as the weighted marginal inclusion
probabilities (wmip) computed with weights
where is the number of distinct
models sampled. Thus, based on the samples
mip for the
th variable is
and wmip is as
The median.model is the model containing variables with
and the wam is the model containing variables
with
. The conditional posterior mean of
given a model is
available in closed form. geomc.vs returns the posterior means of
conditional on the median.model and
the wam.
Value
A list with components
samples |
MCMC samples from |
acceptance.rate |
The acceptance rate based on all samples. |
mip |
The |
median.model |
The median probability model based on post burnin samples. |
beta.med |
The posterior mean of |
wmip |
The |
wam |
The weighted average model based on post burnin samples. |
beta.wam |
The posterior mean of |
log.post |
The n.iter vector of log of the unnormalized marginal posterior pmf |
Author(s)
Vivekananda Roy
References
Roy, V.(2024) A geometric approach to informative MCMC sampling https://arxiv.org/abs/2406.09010
Li, D., Dutta, S., Roy, V.(2023) Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings, Journal of Computational and Graphical Statistics, 32, 61-73
Examples
n=50; p=100; nonzero = 3
trueidx <- 1:3
nonzero.value <- 4
TrueBeta <- numeric(p)
TrueBeta[trueidx] <- nonzero.value
rho <- 0.5
xone <- matrix(rnorm(n*p), n, p)
X <- sqrt(1-rho)*xone + sqrt(rho)*rnorm(n)
y <- 0.5 + X %*% TrueBeta + rnorm(n)
result <- geomc.vs(X=X, y=y)
result$samples # the MCMC samples
result$acceptance.rate #the acceptance.rate
result$mip #marginal inclusion probabilities
result$wmip #weighted marginal inclusion probabilities
result$median.model #the median.model
result$wam #the weighted average model
result$beta.med #the posterior mean of regression coefficients for the median.model
result$beta.wam #the posterior mean of regression coefficients for the wam
result$log.post #the log (unnormalized) posterior probabilities of the MCMC samples.