general_sequence_model {SequenceSpikeSlab} | R Documentation |
Compute marginal posterior estimates
Description
This function computes marginal posterior probabilities (slab probabilities) that data points have non-zero mean for the general hierarchical prior in the sparse normal sequence model. The posterior mean is also provided.
Usage
general_sequence_model(
x,
sigma = 1,
slab = "Laplace",
log_prior = "beta-binomial",
Laplace_lambda = 0.5,
Cauchy_gamma = 1,
beta_kappa = 1,
beta_lambda,
show_progress = TRUE
)
Arguments
x |
Vector of n data points |
sigma |
Standard deviation of the Gaussian noise in the data. May also be set to "auto", in which case sigma is estimated using the estimateSigma function from the selectiveInference package |
slab |
Slab distribution. Must be either "Laplace" or "Cauchy". |
log_prior |
Vector of length n+1 containing the logarithms of the prior probabilities pi_n(s) that the number of spikes is equal to s for s=0,...,n. It is allowed to use an unnormalized prior that does not sum to 1, because adding any constant to the log-prior probabilities does not change the result. Instead of a vector, log_prior may also be set to "beta-binomial" as a short-hand for log_prior = lbeta(beta_kappa+(0:n),beta_lambda+n-(0:n)) - lbeta(beta_kappa,beta_lambda) + lchoose(n,0:n). |
Laplace_lambda |
Parameter of the Laplace slab |
Cauchy_gamma |
Parameter of the Cauchy slab |
beta_kappa |
Parameter of the beta-distribution in the beta-binomial prior |
beta_lambda |
Parameter of the beta-distribution in the beta-binomial prior. Default value=n+1 |
show_progress |
Boolean that indicates whether to show a progress bar |
Details
The run-time is O(n^2) on n data points, which means that doubling the size of the data leads to an increase in computation time by approximately a factor of 4. Data sets of size n=25,000 should be feasible within approximately 30 minutes.
Value
list (postprobs, postmean, sigma), where postprobs is a vector of marginal posterior slab
probabilities that x[i]
has non-zero mean for i=1,...,n
; postmean is a vector with
the posterior mean for the x[i]
; and sigma is the value of sigma (this may be of
interest when the sigma="auto" option is used)
Examples
# Experiments similar to those of Castilo, Van der Vaart, 2012
# Generate data
n <- 500 # sample size
n_signal <- 25 # number of non-zero theta
A <- 5 # signal strength
theta <- c(rep(A,n_signal), rep(0,n-n_signal))
x <- theta + rnorm(n, sd=1)
# Choose slab
slab <- "Laplace"
Laplace_lambda <- 0.5
# Prior 1
kappa1 <- 0.4 # hyperparameter
logprior1 <- c(0,-kappa1*(1:n)*log(n*3/(1:n)))
res1 <- general_sequence_model(x, sigma=1,
slab=slab,
log_prior=logprior1,
Laplace_lambda=Laplace_lambda)
print("Prior 1: Elements with marginal posterior probability >= 0.5:")
print(which(res1$postprobs >= 0.5))
# Prior 2
kappa2 <- 0.8 # hyperparameter
logprior2 <- kappa2*lchoose(2*n-0:n,n)
res2 <- general_sequence_model(x, sigma=1,
slab=slab,
log_prior=logprior2,
Laplace_lambda=Laplace_lambda)
print("Prior 2: Elements with marginal posterior probability >= 0.5:")
print(which(res2$postprobs >= 0.5))
# Prior 3
beta_kappa <- 1 # hyperparameter
beta_lambda <- n+1 # hyperparameter
res3 <- general_sequence_model(x, sigma=1,
slab=slab,
log_prior="beta-binomial",
Laplace_lambda=Laplace_lambda)
print("Prior 3: Elements with marginal posterior probability >= 0.5:")
print(which(res3$postprobs >= 0.5))
# Plot means for all priors
M=max(abs(x))+1
plot(1:n, x, pch=20, ylim=c(-M,M), col='green', xlab="", ylab="", main="Posterior Means")
points(1:n, theta, pch=20, col='blue')
points(1:n, res1$postmean, pch=20, col='black', cex=0.6)
points(1:n, res2$postmean, pch=20, col='magenta', cex=0.6)
points(1:n, res3$postmean, pch=20, col='red', cex=0.6)
legend("topright", legend=c("posterior mean 1", "posterior mean 2", "posterior mean 3",
"data", "truth"),
col=c("black", "magenta", "red", "green", "blue"), pch=20, cex=0.7)