approx_horseshoe {Mhorseshoe}R Documentation

Run approximate MCMC algorithm for horseshoe prior

Description

In this function, The algorithm introduced in Section 2.2 of Johndrow et al. (2020) is implemented, and is a horseshoe estimator that generally considers the case where p>>N. The assumptions and notations for the model are the same as those in Mhorseshoe. This algorithm introduces a threshold and uses only a portion of the total p columns for matrix multiplication, lowering the computational cost compared to the existing horseshoe estimator. According to Section 3.2 of Johndrow et al. (2020), the approximate MCMC algorithm applying the methodology constructs an approximate Markov chain P_{\epsilon} that can converge to an exact Markov chain P, and acceptable results were confirmed through empirical analysis of simulation and real data. The "auto.threshold" argument in this function is an adaptive probability algorithm for threshold developed in this package, which is an algorithm that estimates and updates a new threshold through updated shrinkage parameters.

Usage

approx_horseshoe(
  X,
  y,
  burn = 1000,
  iter = 5000,
  auto.threshold = TRUE,
  threshold = 0,
  tau = 1,
  s = 0.8,
  sigma2 = 1,
  w = 1,
  alpha = 0.05,
  a = 0.2,
  b = 10,
  t = 10,
  adapt_p0 = 0,
  adapt_p1 = -4.6 * 10^(-4)
)

Arguments

X

Design matrix, X \in \mathbb{R}^{N \times p}.

y

Response vector, y \in \mathbb{R}^{N}.

burn

Number of burn-in samples. Default is 1000.

iter

Number of samples to be drawn from the posterior. Default is 5000.

auto.threshold

Argument for setting whether to use an algorithm that automatically updates the threshold using adaptive probability.

threshold

Threshold to be used in the approximate MCMC algorithm. If you select auto.threshold = FALSE, and threshold = 0(This is the default value for the threshold argument), the threshold is set according to the sizes of N and p. if p < N, \delta = 1/\sqrt{Np}, else \delta = 1/p. Or, you can set your custom value directly through this argument. For more information about \delta, see Mhorseshoe and 4.1 of Johndrow et al. (2020).

tau

Initial value of the global shrinkage parameter \tau when starting the algorithm. Default is 1.

s

s^{2} is the variance of tau's MH proposal distribution. 0.8 is a good default. If set to 0, the algorithm proceeds by fixing the global shrinkage parameter \tau to the initial setting value.

sigma2

error variance \sigma^{2}. Default is 1.

w

Parameter of gamma prior for \sigma^{2}. Default is 1.

alpha

100(1-\alpha)\% credible interval setting argument.

a

Parameter of the rejection sampler, and it is recommended to leave it at the default value, a = 1/5.

b

Parameter of the rejection sampler, and it is recommended to leave it at the default value, b = 10.

t

Threshold update cycle for adaptive probability algorithm when auto.threshold is set to TRUE. default is 10.

adapt_p0

Parameter p_{0} of adaptive probability, p(t) = exp[p_{0} + p_{1}t]. default is 0.

adapt_p1

Parameter a_{1} of adaptive probability, p(t) = exp[p_{0} + p_{1}t]. default is -4.6 \times 10^{-4}.

Value

BetaHat

Posterior mean of \beta.

LeftCI

Lower bound of 100(1-\alpha)\% credible interval for \beta.

RightCI

Upper bound of 100(1-\alpha)\% credible interval for \beta.

Sigma2Hat

Posterior mean of \sigma^{2}.

TauHat

Posterior mean of \tau.

LambdaHat

Posterior mean of \lambda_{j},\ j=1,2,...p..

ActiveMean

Average number of elements in the active set per iteration in this algorithm.

BetaSamples

Samples from the posterior of \beta.

LambdaSamples

Lambda samples through rejection sampling.

TauSamples

Tau samples through MH algorithm.

Sigma2Samples

Samples from the posterior of the parameter sigma^{2}.

ActiveSet

Matrix indicating active elements as 1 and non-active elements as 0 per iteration of the MCMC algorithm.

Approximate algorithm details

Approximate algorithm has the following changes:

D_{\delta} = diag\left(\eta_{j}^{-1}1\left(\xi^{-1}\eta_{j}^{-1} > \delta,\ j=1,2,...,p. \right) \right),

M_{\xi} \approx M_{\xi, \delta} = I_{N} + \xi^{-1}XD_{\delta}X^{T},

Where \eta_{j} = \lambda_{j}^{-2}, \lambda_{j} are local shrinkage parameters, \xi = \tau^{-2}, \tau is a global shrinkage parameter, 1(\cdot) is an indicator function that returns 1 if the conditions in the parentheses are satisfied, and 0 otherwise, and \delta is the threshold. The set of X's columns: \{x_{j}\ :\ \xi^{-1}\eta_{j}^{-1} > \delta,\ j = 1,2,..., p \} is defined as the active set, and let's define S as the index set of the active set:

S = \{j\ |\ \xi^{-1}\eta_{j}^{-1} > \delta,\ j=1,2,...,p. \}.

Recalling the posterior distribution for \beta, it is as follows:

\beta | y, X, \eta, \xi, \sigma \sim N\left(\left(X^{T}X + \left(\xi^{-1} D \right)^{-1}\right)^{-1}X^{T}y, \ \sigma^{2}\left(X^{T}X + \left(\xi^{-1} D \right)^{-1} \right)^{-1} \right).

If \xi^{-1} \eta_{j}^{-1} is very small, the posterior of \beta will have a mean and variance close to 0. Therefore, let's set \xi^{-1}\eta_{j}^{-1} smaller than \delta to 0 and the size of inverse M_{\xi, \delta} matrix is reduced as follows.

length(S)=s_{\delta} \le p, \\ X_{S} \in R^{N \times s_{\delta}}, \quad D_{S} \in R^{s_{\delta} \times s_{\delta}}, \\ M_{\xi, \delta}^{-1} = \left(I_{N} + \xi^{-1}X_{S}D_{S}X_{S}^{T} \right)^{-1}.

M_{\xi, \delta}^{-1} can be expressed using the Woodbury identity as follows.

M_{\xi, \delta}^{-1} = I_{N} - X_{S}\left(\xi D_{S}^{-1} + X_{S}^{T}X_{S} \right)^{-1}X_{S}^{T}.

M_{\xi, \delta}^{-1}, which reduces the computational cost, is applied to all parts of this algorithm, \beta samples are extracted from the posterior using fast sampling(Bhattacharya et al.,2016) as follows.

u \sim N_{p}(0, \xi^{-1}D),\quad f \sim N_{N}(0, I_{N}), \\ v = Xu + f,\quad v^{\star} = M_{\xi, \delta}^{-1}(y/\sigma - v), \\ \beta = \sigma(u + \xi^{-1}D_{\delta}X^{T}v^{\star}).

Adaptive probability algorithm for threshold update

If the auto.threshold argument is set to TRUE, this algorithm operates every t iteration to estimate the threshold and decide whether to update. In this algorithm, the process of updating a new threshold is added by applying the properties of the shrinkage weight k_{j},\ j=1,2,...,p proposed by Piironen and Vehtari (2017). In the prior of \beta_{j} \sim N(0, \sigma^{2}\tau^{2}\lambda_{j}^{2}) = N(0, \sigma^{2}\xi^{-1} \eta_{j}^{-1}), the variable m_{eff} is defined as follows.

k_{j} = 1/\left(1+n\xi^{-1}s_{j}^{2}\eta_{j}^{-1} \right), \quad j=1,2,...,p, \\ m_{eff} = \sum_{j=1}^{p}{\left(1-k_{j} \right)}.

The assumptions and notations for the model are the same as those in Mhorseshoe, and s_{j},\ j=1,2,...,p are the diagonal components of X^{T}X. For the zero components of \beta, k_{j} is derived close to 1, and nonzero's k_{j} is derived close to 0, so the variable m_{eff}, the sum of 1-k_{j}, is called the effective number of nonzero coefficients. In this algorithm, the threshold \delta is updated to set s_{\delta} = ceiling(m_{eff}).

Adaptive probability is defined to satisfy Theorem 5(diminishing adaptation condition) of Roberts and Rosenthal (2007). at Tth iteration,

p(T) = exp[p_{0} + p_{1}T],\quad p_{1} < 0, \quad u \sim U(0, 1), \\ if\ u < p(T),\ update\ \delta\ so\ that\ s_{\delta} = ceiling(m_{eff}).

The default is p_{0} = 0, p_{1} = -4.6 \times 10^{-4}, and under this condition, p(10000) < 0.01 is satisfied.

References

Bhattacharya, A., Chakraborty, A., & Mallick, B. K. (2016). Fast sampling with Gaussian scale mixture priors in high-dimensional regression. Biometrika, asw042.

Johndrow, J., Orenstein, P., & Bhattacharya, A. (2020). Scalable Approximate MCMC Algorithms for the Horseshoe Prior. In Journal of Machine Learning Research (Vol. 21).

Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11, 5018-5051.

Roberts G, Rosenthal J. Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J Appl Prob. 2007;44:458–475.

Examples

# Making simulation data.
set.seed(123)
N <- 200
p <- 100
true_beta <- c(rep(1, 10), rep(0, 90))

X <- matrix(1, nrow = N, ncol = p) # Design matrix X.
for (i in 1:p) {
  X[, i] <- stats::rnorm(N, mean = 0, sd = 1)
}

y <- vector(mode = "numeric", length = N) # Response variable y.
e <- rnorm(N, mean = 0, sd = 2) # error term e.
for (i in 1:10) {
  y <- y + true_beta[i] * X[, i]
}
y <- y + e

# Run with auto.threshold option
result1 <- approx_horseshoe(X, y, burn = 0, iter = 100)

# Run with fixed custom threshold
result2 <- approx_horseshoe(X, y, burn = 0, iter = 100,
                            auto.threshold = FALSE, threshold = 1/(5 * p))

# posterior mean
betahat <- result1$BetaHat

# Lower bound of the 95% credible interval
leftCI <- result1$LeftCI

# Upper bound of the 95% credible interval
RightCI <- result1$RightCI


[Package Mhorseshoe version 0.1.2 Index]