geomc.vs {geommc} | R Documentation |
Markov chain Monte Carlo for Bayesian variable selection using a geometric MH algorithm.
Description
geomc.vs uses a geometric approach to MCMC for performing Bayesian variable selection. It produces MCMC samples from the posterior density of a Bayesian hierarchical feature selection model.
Usage
geomc.vs(
X,
y,
initial = NULL,
n.iter = 50,
burnin = 1,
eps = 0.5,
symm = TRUE,
move.prob = c(0.4, 0.4, 0.2),
model.threshold = 0.5,
lam = nrow(X)/ncol(X)^2,
w = sqrt(nrow(X))/ncol(X)
)
Arguments
X |
The |
y |
The response vector of length |
initial |
is the initial model (the set of active variables). Default: Null model. |
n.iter |
is the no. of samples needed. Default: 50. |
burnin |
is the value of burnin used to compute the median probability model. Default: 1. |
eps |
is the value for epsilon perturbation. Default: 0.5. |
symm |
indicates if the base density is of symmetric RW-MH. Default: True. |
move.prob |
is the vector of ('addition', 'deletion', 'swap') move probabilities. Default: (0.4,0.4,0.2). move.prob is used only when symm is set to False. |
model.threshold |
The threshold probability to select the covariates for the median model (median.model) and the weighted average model (wam). A covariate will be included in median.model (wam) if its marginal inclusion probability (weighted marginal inclusion probability) is greater than the threshold. Default: 0.5. |
lam |
The slab precision parameter. Default: |
w |
The prior inclusion probability of each variable. Default: |
Details
geomc.vs provides MCMC samples using the geometric MH algorithm of Roy (2024) for variable selection based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space as follows:
y | X, \beta_0,\beta,\gamma,\sigma^2,w,\lambda \sim N(\beta_01 + X_\gamma\beta_\gamma,\sigma^2I_n)
\beta_i|\beta_0,\gamma,\sigma^2,w,\lambda \stackrel{indep.}{\sim} N(0, \gamma_i\sigma^2/\lambda),~i=1,\ldots,p,
(\beta_0,\sigma^2)|\gamma,w,p \sim p(\beta_0,\sigma^2) \propto 1/\sigma^2
\gamma_i|w,\lambda \stackrel{iid}{\sim} Bernoulli(w)
where X_\gamma
is the n \times |\gamma|
submatrix of X
consisting of
those columns of X
for which \gamma_i=1
and similarly, \beta_\gamma
is the
|\gamma|
subvector of \beta
corresponding to \gamma
.
geomc.vs provides MCMC samples from the posterior pmf of the models P(\gamma|y)
, which is available
up to a normalizing constant.
geomc.vs also returns the marginal inclusion probabilities (mip)
computed by the Monte Carlo average as well as the weighted marginal inclusion
probabilities (wmip) computed with weights
w_i =
P(\gamma^{(i)}|y)/\sum_{k=1}^K P(\gamma^{(k)}|y), i=1,2,...,K
where K
is the number of distinct
models sampled. Thus, based on the samples \gamma^{(k)}, k=1,2,...,n.iter
mip for the j
th variable is
mip_j =
\sum_{k=1}^{n.iter} I(\gamma^{(k)}_j = 1)/n.iter
and wmip is as
wmip_j =
\sum_{k=1}^K w_k I(\gamma^{(k)}_j = 1).
The median.model is the model containing variables j
with mip_j >
\code{model.threshold}
and the wam is the model containing variables j
with wmip_j >
\code{model.threshold}
. The conditional posterior mean of \beta
given a model is
available in closed form. geomc.vs returns the posterior means of \beta
conditional on the median.model and
the wam.
Value
A list with components
samples |
MCMC samples from |
acceptance.rate |
The acceptance rate based on all samples. |
mip |
The |
median.model |
The median probability model based on post burnin samples. |
beta.med |
The posterior mean of |
wmip |
The |
wam |
The weighted average model based on post burnin samples. |
beta.wam |
The posterior mean of |
log.post |
The n.iter vector of log of the unnormalized marginal posterior pmf |
Author(s)
Vivekananda Roy
References
Roy, V.(2024) A geometric approach to informative MCMC sampling https://arxiv.org/abs/2406.09010
Li, D., Dutta, S., Roy, V.(2023) Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings, Journal of Computational and Graphical Statistics, 32, 61-73
Examples
n=50; p=100; nonzero = 3
trueidx <- 1:3
nonzero.value <- 4
TrueBeta <- numeric(p)
TrueBeta[trueidx] <- nonzero.value
rho <- 0.5
xone <- matrix(rnorm(n*p), n, p)
X <- sqrt(1-rho)*xone + sqrt(rho)*rnorm(n)
y <- 0.5 + X %*% TrueBeta + rnorm(n)
result <- geomc.vs(X=X, y=y)
result$samples # the MCMC samples
result$acceptance.rate #the acceptance.rate
result$mip #marginal inclusion probabilities
result$wmip #weighted marginal inclusion probabilities
result$median.model #the median.model
result$wam #the weighted average model
result$beta.med #the posterior mean of regression coefficients for the median.model
result$beta.wam #the posterior mean of regression coefficients for the wam
result$log.post #the log (unnormalized) posterior probabilities of the MCMC samples.