approx_horseshoe {Mhorseshoe} | R Documentation |
Run approximate MCMC algorithm for horseshoe prior
Description
In this function, The algorithm introduced in Section 2.2 of Johndrow et al.
(2020) is implemented, and is a horseshoe estimator that generally considers
the case where p>>N
. The assumptions and notations for the model are
the same as those in Mhorseshoe
. This algorithm introduces a
threshold and uses only a portion of the total p
columns for matrix
multiplication, lowering the computational cost compared to the existing
horseshoe estimator. According to Section 3.2 of Johndrow et al. (2020), the
approximate MCMC algorithm applying the methodology constructs an
approximate Markov chain P_{\epsilon}
that can converge to an exact
Markov chain P
, and acceptable results were confirmed through
empirical analysis of simulation and real data. The "auto.threshold"
argument in this function is an adaptive probability algorithm for threshold
developed in this package, which is an algorithm that estimates and updates
a new threshold through updated shrinkage parameters.
Usage
approx_horseshoe(
X,
y,
burn = 1000,
iter = 5000,
auto.threshold = TRUE,
threshold = 0,
tau = 1,
s = 0.8,
sigma2 = 1,
w = 1,
alpha = 0.05,
a = 0.2,
b = 10,
t = 10,
adapt_p0 = 0,
adapt_p1 = -4.6 * 10^(-4)
)
Arguments
X |
Design matrix, |
y |
Response vector, |
burn |
Number of burn-in samples. Default is 1000. |
iter |
Number of samples to be drawn from the posterior. Default is 5000. |
auto.threshold |
Argument for setting whether to use an algorithm that automatically updates the threshold using adaptive probability. |
threshold |
Threshold to be used in the approximate MCMC algorithm.
If you select auto.threshold = FALSE, and threshold = 0(This is the default
value for the threshold argument), the threshold is set according to the
sizes of N and p. if |
tau |
Initial value of the global shrinkage parameter |
s |
|
sigma2 |
error variance |
w |
Parameter of gamma prior for |
alpha |
|
a |
Parameter of the rejection sampler, and it is recommended to leave
it at the default value, |
b |
Parameter of the rejection sampler, and it is recommended to leave
it at the default value, |
t |
Threshold update cycle for adaptive probability algorithm when auto.threshold is set to TRUE. default is 10. |
adapt_p0 |
Parameter |
adapt_p1 |
Parameter |
Value
BetaHat |
Posterior mean of |
LeftCI |
Lower bound of |
RightCI |
Upper bound of |
Sigma2Hat |
Posterior mean of |
TauHat |
Posterior mean of |
LambdaHat |
Posterior mean of |
ActiveMean |
Average number of elements in the active set per iteration in this algorithm. |
BetaSamples |
Samples from the posterior of |
LambdaSamples |
Lambda samples through rejection sampling. |
TauSamples |
Tau samples through MH algorithm. |
Sigma2Samples |
Samples from the posterior of the parameter
|
ActiveSet |
Matrix indicating active elements as 1 and non-active elements as 0 per iteration of the MCMC algorithm. |
Approximate algorithm details
Approximate algorithm has the following changes:
D_{\delta} = diag\left(\eta_{j}^{-1}1\left(\xi^{-1}\eta_{j}^{-1}
> \delta,\ j=1,2,...,p. \right) \right),
M_{\xi} \approx M_{\xi, \delta} = I_{N} + \xi^{-1}XD_{\delta}X^{T},
Where \eta_{j} = \lambda_{j}^{-2}
, \lambda_{j}
are local shrinkage parameters, \xi = \tau^{-2}
, \tau
is a
global shrinkage parameter, 1(\cdot)
is an indicator function that
returns 1
if the conditions in the parentheses are satisfied, and
0
otherwise, and \delta
is the threshold. The set of
X's columns: \{x_{j}\ :\ \xi^{-1}\eta_{j}^{-1} > \delta,\ j = 1,2,...,
p \}
is defined as the active set, and let's define S
as the
index set of the active set:
S = \{j\ |\ \xi^{-1}\eta_{j}^{-1} > \delta,\ j=1,2,...,p. \}.
Recalling the posterior distribution for \beta
, it is as follows:
\beta | y, X, \eta, \xi, \sigma \sim
N\left(\left(X^{T}X + \left(\xi^{-1} D \right)^{-1}\right)^{-1}X^{T}y,
\ \sigma^{2}\left(X^{T}X + \left(\xi^{-1} D \right)^{-1} \right)^{-1}
\right).
If \xi^{-1} \eta_{j}^{-1}
is very small, the posterior of \beta
will have a mean and variance close to 0. Therefore, let's set
\xi^{-1}\eta_{j}^{-1}
smaller than \delta
to 0 and the size of
inverse M_{\xi, \delta}
matrix is reduced as follows.
length(S)=s_{\delta} \le p, \\ X_{S} \in R^{N \times s_{\delta}},
\quad D_{S} \in R^{s_{\delta} \times s_{\delta}}, \\ M_{\xi, \delta}^{-1} =
\left(I_{N} + \xi^{-1}X_{S}D_{S}X_{S}^{T} \right)^{-1}.
M_{\xi, \delta}^{-1}
can be expressed using the Woodbury identity
as follows.
M_{\xi, \delta}^{-1} = I_{N} - X_{S}\left(\xi D_{S}^{-1} +
X_{S}^{T}X_{S} \right)^{-1}X_{S}^{T}.
M_{\xi, \delta}^{-1}
, which reduces the computational cost, is
applied to all parts of this algorithm, \beta
samples are extracted
from the posterior using fast sampling(Bhattacharya et al.,2016) as follows.
u \sim N_{p}(0, \xi^{-1}D),\quad f \sim N_{N}(0, I_{N}), \\
v = Xu + f,\quad v^{\star} = M_{\xi, \delta}^{-1}(y/\sigma - v), \\
\beta = \sigma(u + \xi^{-1}D_{\delta}X^{T}v^{\star}).
Adaptive probability algorithm for threshold update
If the auto.threshold argument is set to TRUE, this algorithm operates
every t
iteration to estimate the threshold and decide whether to
update. In this algorithm, the process of updating a new threshold is added
by applying the properties of the shrinkage weight k_{j},\ j=1,2,...,p
proposed by Piironen and Vehtari (2017). In the prior of \beta_{j} \sim
N(0, \sigma^{2}\tau^{2}\lambda_{j}^{2}) = N(0, \sigma^{2}\xi^{-1}
\eta_{j}^{-1})
, the variable m_{eff}
is defined as follows.
k_{j} = 1/\left(1+n\xi^{-1}s_{j}^{2}\eta_{j}^{-1} \right),
\quad j=1,2,...,p, \\ m_{eff} = \sum_{j=1}^{p}{\left(1-k_{j} \right)}.
The assumptions and notations for the model are the same as those in
Mhorseshoe
, and s_{j},\ j=1,2,...,p
are the diagonal
components of X^{T}X
. For the zero components of \beta
,
k_{j}
is derived close to 1, and nonzero's k_{j}
is derived
close to 0, so the variable m_{eff}
, the sum of 1-k_{j}
, is
called the effective number of nonzero coefficients. In this algorithm, the
threshold \delta
is updated to set
s_{\delta} = ceiling(m_{eff})
.
Adaptive probability is defined to satisfy
Theorem 5(diminishing adaptation condition) of Roberts and Rosenthal (2007).
at T
th iteration,
p(T) = exp[p_{0} + p_{1}T],\quad p_{1} < 0,
\quad u \sim U(0, 1), \\ if\ u < p(T),\ update\ \delta\ so\ that\
s_{\delta} = ceiling(m_{eff}).
The default is p_{0} = 0
, p_{1} = -4.6 \times 10^{-4}
, and
under this condition, p(10000) < 0.01
is satisfied.
References
Bhattacharya, A., Chakraborty, A., & Mallick, B. K. (2016). Fast sampling with Gaussian scale mixture priors in high-dimensional regression. Biometrika, asw042.
Johndrow, J., Orenstein, P., & Bhattacharya, A. (2020). Scalable Approximate MCMC Algorithms for the Horseshoe Prior. In Journal of Machine Learning Research (Vol. 21).
Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11, 5018-5051.
Roberts G, Rosenthal J. Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J Appl Prob. 2007;44:458–475.
Examples
# Making simulation data.
set.seed(123)
N <- 200
p <- 100
true_beta <- c(rep(1, 10), rep(0, 90))
X <- matrix(1, nrow = N, ncol = p) # Design matrix X.
for (i in 1:p) {
X[, i] <- stats::rnorm(N, mean = 0, sd = 1)
}
y <- vector(mode = "numeric", length = N) # Response variable y.
e <- rnorm(N, mean = 0, sd = 2) # error term e.
for (i in 1:10) {
y <- y + true_beta[i] * X[, i]
}
y <- y + e
# Run with auto.threshold option
result1 <- approx_horseshoe(X, y, burn = 0, iter = 100)
# Run with fixed custom threshold
result2 <- approx_horseshoe(X, y, burn = 0, iter = 100,
auto.threshold = FALSE, threshold = 1/(5 * p))
# posterior mean
betahat <- result1$BetaHat
# Lower bound of the 95% credible interval
leftCI <- result1$LeftCI
# Upper bound of the 95% credible interval
RightCI <- result1$RightCI