fast_spike_slab_beta {SequenceSpikeSlab} | R Documentation |
Compute marginal posterior estimates for beta-spike-and-slab prior
Description
Computes marginal posterior probabilities (slab probabilities) that data points have non-zero mean for the spike-and-slab prior with a Beta(beta_kappa,beta_lambda) prior on the mixing parameter. The posterior mean is also provided.
Usage
fast_spike_slab_beta(
x,
sigma = 1,
m = 20,
slab = "Laplace",
Laplace_lambda = 0.5,
Cauchy_gamma = 1,
beta_kappa = 1,
beta_lambda,
show_progress = TRUE
)
Arguments
x |
Vector of n data points |
sigma |
Standard deviation of the Gaussian noise in the data. May also be set to "auto", in which case sigma is estimated using the estimateSigma function from the selectiveInference package |
m |
The number of discretization points used is proportional to m*sqrt(n). The larger m, the better the approximation, but the runtime also increases linearly with m. The default m=20 usually gives sufficient numerical precision. |
slab |
Slab distribution. Must be either "Laplace" or "Cauchy". |
Laplace_lambda |
Parameter of the Laplace slab |
Cauchy_gamma |
Parameter of the Cauchy slab |
beta_kappa |
Parameter of the beta-distribution |
beta_lambda |
Parameter of the beta-distribution. Default value=n+1 |
show_progress |
Boolean that indicates whether to show a progress bar |
Details
The run-time is O(m*n^(3/2)) on n data points, which means that doubling the size of the data leads to an increase in computation time by approximately a factor of 2*sqrt(2)=2.8. Data sets of size n=100,000 should be feasible within approximately 30 minutes.
Value
list (postprobs, postmean, sigma), where postprobs is a vector of marginal posterior slab
probabilities that x[i]
has non-zero mean for i=1,...,n
; postmean is a vector with
the posterior mean for the x[i]
; and sigma is the value of sigma (this may be of
interest when the sigma="auto" option is used)
Examples
# Illustrate that fast_spike_slab_beta is a faster way to compute the same results as
# general_sequence_model on the beta-binomial prior
# Generate data
n <- 500 # sample size
n_signal <- 25 # number of non-zero theta
A <- 5 # signal strength
theta <- c(rep(A,n_signal), rep(0,n-n_signal))
x <- theta + rnorm(n, sd=1)
# Choose slab
slab <- "Cauchy"
Cauchy_gamma <- 1
cat("Running fast_spike_slab_beta (fast for very large n)...\n")
res_fss <- fast_spike_slab_beta(x, sigma=1, slab=slab, Cauchy_gamma=Cauchy_gamma)
cat("Running general_sequence_model (slower for very large n)...\n")
res_gsm <- general_sequence_model(x, sigma=1, slab=slab,
log_prior="beta-binomial", Cauchy_gamma=Cauchy_gamma)
cat("Maximum difference in marginal posterior slab probabilities:",
max(abs(res_gsm$postprobs - res_fss$postprobs)))
cat("\nMaximum difference in posterior means:",
max(abs(res_gsm$postmean - res_fss$postmean)), "\n")
# Plot means
M=max(abs(x))+1
plot(1:n, x, pch=20, ylim=c(-M,M), col='green', xlab="", ylab="",
main="Posterior Means (Same for Both Methods)")
points(1:n, theta, pch=20, col='blue')
points(1:n, res_gsm$postmean, pch=20, col='black', cex=0.6)
points(1:n, res_fss$postmean, pch=20, col='magenta', cex=0.6)
legend("topright", legend=c("general_sequence_model", "fast_spike_slab_beta",
"data", "truth"),
col=c("black", "magenta", "green", "blue"), pch=20, cex=0.7)