geomc.vs {geommc}R Documentation

Markov chain Monte Carlo for Bayesian variable selection using a geometric MH algorithm.

Description

geomc.vs uses a geometric approach to MCMC for performing Bayesian variable selection. It produces MCMC samples from the posterior density of a Bayesian hierarchical feature selection model.

Usage

geomc.vs(
  X,
  y,
  initial = NULL,
  n.iter = 50,
  burnin = 1,
  eps = 0.5,
  symm = TRUE,
  move.prob = c(0.4, 0.4, 0.2),
  model.threshold = 0.5,
  lam = nrow(X)/ncol(X)^2,
  w = sqrt(nrow(X))/ncol(X)
)

Arguments

X

The n\times p covariate matrix without intercept. The following classes are supported: matrix and dgCMatrix. No need to center or scale this matrix manually. Scaling is performed implicitly and regression coefficients are returned on the original scale.

y

The response vector of length n. No need to center or scale.

initial

is the initial model (the set of active variables). Default: Null model.

n.iter

is the no. of samples needed. Default: 50.

burnin

is the value of burnin used to compute the median probability model. Default: 1.

eps

is the value for epsilon perturbation. Default: 0.5.

symm

indicates if the base density is of symmetric RW-MH. Default: True.

move.prob

is the vector of ('addition', 'deletion', 'swap') move probabilities. Default: (0.4,0.4,0.2). move.prob is used only when symm is set to False.

model.threshold

The threshold probability to select the covariates for the median model (median.model) and the weighted average model (wam). A covariate will be included in median.model (wam) if its marginal inclusion probability (weighted marginal inclusion probability) is greater than the threshold. Default: 0.5.

lam

The slab precision parameter. Default: n/p^2 as suggested by the theoretical results of Li, Dutta, Roy (2023).

w

The prior inclusion probability of each variable. Default: \sqrt{n}/p.

Details

geomc.vs provides MCMC samples using the geometric MH algorithm of Roy (2024) for variable selection based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space as follows:

y | X, \beta_0,\beta,\gamma,\sigma^2,w,\lambda \sim N(\beta_01 + X_\gamma\beta_\gamma,\sigma^2I_n)

\beta_i|\beta_0,\gamma,\sigma^2,w,\lambda \stackrel{indep.}{\sim} N(0, \gamma_i\sigma^2/\lambda),~i=1,\ldots,p,

(\beta_0,\sigma^2)|\gamma,w,p \sim p(\beta_0,\sigma^2) \propto 1/\sigma^2

\gamma_i|w,\lambda \stackrel{iid}{\sim} Bernoulli(w)

where X_\gamma is the n \times |\gamma| submatrix of X consisting of those columns of X for which \gamma_i=1 and similarly, \beta_\gamma is the |\gamma| subvector of \beta corresponding to \gamma. geomc.vs provides MCMC samples from the posterior pmf of the models P(\gamma|y), which is available up to a normalizing constant. geomc.vs also returns the marginal inclusion probabilities (mip) computed by the Monte Carlo average as well as the weighted marginal inclusion probabilities (wmip) computed with weights

w_i = P(\gamma^{(i)}|y)/\sum_{k=1}^K P(\gamma^{(k)}|y), i=1,2,...,K

where K is the number of distinct models sampled. Thus, based on the samples \gamma^{(k)}, k=1,2,...,n.iter mip for the jth variable is

mip_j = \sum_{k=1}^{n.iter} I(\gamma^{(k)}_j = 1)/n.iter

and wmip is as

wmip_j = \sum_{k=1}^K w_k I(\gamma^{(k)}_j = 1).

The median.model is the model containing variables j with mip_j > \code{model.threshold} and the wam is the model containing variables j with wmip_j > \code{model.threshold}. The conditional posterior mean of \beta given a model is available in closed form. geomc.vs returns the posterior means of \beta conditional on the median.model and the wam.

Value

A list with components

samples

MCMC samples from P(\gamma|y) returned as p \timesn.iter sparse dgCMatrix.

acceptance.rate

The acceptance rate based on all samples.

mip

The p vector of marginal inclusion probabilities of all variables based on post burnin samples.

median.model

The median probability model based on post burnin samples.

beta.med

The posterior mean of \beta (the p+1 vector c(intercept, regression coefficients)) conditional on the median.model based on post burnin samples.

wmip

The p vector of weighted marginal inclusion probabilities of all variables based on post burnin samples.

wam

The weighted average model based on post burnin samples.

beta.wam

The posterior mean of \beta (the p+1 vector c(intercept, regression coefficients)) conditional on the wam based on post burnin samples.

log.post

The n.iter vector of log of the unnormalized marginal posterior pmf P(\gamma|y) evaluated at the samples.

Author(s)

Vivekananda Roy

References

Roy, V.(2024) A geometric approach to informative MCMC sampling https://arxiv.org/abs/2406.09010

Li, D., Dutta, S., Roy, V.(2023) Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings, Journal of Computational and Graphical Statistics, 32, 61-73

Examples

n=50; p=100; nonzero = 3
trueidx <- 1:3
nonzero.value <- 4
TrueBeta <- numeric(p)
TrueBeta[trueidx] <- nonzero.value
rho <- 0.5
xone <- matrix(rnorm(n*p), n, p)
X <- sqrt(1-rho)*xone + sqrt(rho)*rnorm(n)
y <- 0.5 + X %*% TrueBeta + rnorm(n)
result <- geomc.vs(X=X, y=y)
result$samples # the MCMC samples
result$acceptance.rate #the acceptance.rate
result$mip #marginal inclusion probabilities
result$wmip #weighted marginal inclusion probabilities
result$median.model #the median.model
result$wam #the weighted average model
result$beta.med #the posterior mean of regression coefficients for the median.model
result$beta.wam #the posterior mean of regression coefficients for the wam
result$log.post #the log (unnormalized) posterior probabilities of the MCMC samples.

[Package geommc version 0.0.1 Index]