| ebnm {ebnm} | R Documentation |
Solve the EBNM problem
Description
Solves the empirical Bayes normal means (EBNM) problem using a specified family of priors. For an article-length introduction to the package, see Willwerscheid and Stephens (2021), cited in References below.
Usage
ebnm(
x,
s = 1,
prior_family = c("point_normal", "point_laplace", "point_exponential", "normal",
"horseshoe", "normal_scale_mixture", "unimodal", "unimodal_symmetric",
"unimodal_nonnegative", "unimodal_nonpositive", "generalized_binary", "npmle",
"deconvolver", "flat", "point_mass", "ash"),
mode = 0,
scale = "estimate",
g_init = NULL,
fix_g = FALSE,
output = ebnm_output_default(),
optmethod = NULL,
control = NULL,
...
)
ebnm_output_default()
ebnm_output_all()
Arguments
x |
A vector of observations. Missing observations ( |
s |
A vector of standard errors (or a scalar if all are equal).
Standard errors may not be exactly zero, and
missing standard errors are not allowed. Two prior families have
additional restrictions: when horseshoe priors are used, errors
must be homoskedastic; and since function
|
prior_family |
A character string that specifies the prior family
|
mode |
A scalar specifying the mode of the prior |
scale |
A scalar or vector specifying the scale parameter(s) of the
prior or The interpretation of |
g_init |
The prior distribution |
fix_g |
If |
output |
A character vector indicating which values are to be returned.
Function |
optmethod |
A string specifying which optimization function is to be
used. Since all non-parametric families rely upon external packages, this
parameter is only available for parametric families (point-normal,
point-Laplace, point-exponential, and normal). Options include |
control |
A list of control parameters to be passed to the optimization
function. |
... |
Additional parameters. When a |
Details
Given vectors of data x and standard errors s, ebnm
solves the "empirical Bayes normal means" (EBNM) problem for various
choices of prior family.
The model is
x_j | \theta_j, s_j \sim N(\theta_j, s_j^2)
\theta_j \sim g \in G,
where g, which is referred to as the
"prior distribution" for \theta, is to be estimated from among
some specified family of prior distributions G. Several options
for G are implemented, some parametric and others non-parametric;
see below for examples.
Solving the EBNM problem involves
two steps. First, g \in G is estimated via maximum marginal likelihood:
\hat{g} := \arg\max_{g \in G} L(g),
where
L(g) := \prod_j \int p(x_j | \theta_j, s_j) g(d\theta_j).
Second, posterior distributions
p(\theta_j | x_j, s_j, \hat{g}) and/or summaries
such as posterior means and posterior second moments are computed.
Implemented prior families include:
point_normalThe family of mixtures where one component is a point mass at
\muand the other is a normal distribution centered at\mu.point_laplaceThe family of mixtures where one component is a point mass at
\muand the other is a double-exponential distribution centered at\mu.point_exponentialThe family of mixtures where one component is a point mass at
\muand the other is a (nonnegative) exponential distribution with mode\mu.normalThe family of normal distributions.
horseshoeThe family of horseshoe distributions.
normal_scale_mixtureThe family of scale mixtures of normals.
unimodalThe family of all unimodal distributions.
unimodal_symmetricThe family of symmetric unimodal distributions.
unimodal_nonnegativeThe family of unimodal distributions with support constrained to be greater than the mode.
unimodal_nonpositiveThe family of unimodal distributions with support constrained to be less than the mode.
generalized_binaryThe family of mixtures where one component is a point mass at zero and the other is a truncated normal distribution with lower bound zero and nonzero mode. See Liu et al. (2023), cited in References below.
npmleThe family of all distributions.
deconvolverA non-parametric exponential family with a natural spline basis. Like
npmle, there is no unimodal assumption, but whereasnpmleproduces spiky estimates forg,deconvolverestimates are much more regular. SeedeconvolveR-packagefor details and references.flatThe "non-informative" improper uniform prior, which yields posteriors
\theta_j | x_j, s_j \sim N(x_j, s_j^2).point_massThe family of point masses
\delta_\mu. Posteriors are point masses at\mu.ashCalls into function
ashin packageashr. Can be used to make direct comparisons ofebnmandashrimplementations of prior families such as scale mixtures of normals and the NPMLE.
Value
An ebnm object. Depending on the argument to output, the
object is a list containing elements:
dataA data frame containing the observations
xand standard errorss.posteriorA data frame of summary results (posterior means, standard deviations, second moments, and local false sign rates).
fitted_gThe fitted prior
\hat{g}(an object of classnormalmix,laplacemix,gammamix,unimix,tnormalmix, orhorseshoe).log_likelihoodThe optimal log likelihood attained,
L(\hat{g}).posterior_samplerA function that can be used to produce samples from the posterior. For all prior families other than the horseshoe, the sampler takes a single parameter
nsamp, the number of posterior samples to return per observation. Sinceebnm_horseshoereturns an MCMC sampler, it additionally takes parameterburn, the number of burn-in samples to discard.
S3 methods coef, confint, fitted, logLik,
nobs, plot, predict, print, quantile,
residuals, simulate, summary, and vcov
have been implemented for ebnm objects. For details, see the
respective help pages, linked below under See Also.
Functions
-
ebnm_output_default(): Lists the default return values. -
ebnm_output_all(): Lists all valid return values.
References
Jason Willwerscheid and Matthew Stephens (2021).
ebnm: an R Package for solving the empirical Bayes
normal means problem using a variety of prior families. arXiv:2110.00152.
Yusha Liu, Peter Carbonetto, Jason Willwerscheid, Scott A Oakes, Kay F Macleod, and Matthew Stephens (2023). Dissecting tumor transcriptional heterogeneity from single-cell RNA-seq data by generalized binary covariance decomposition. bioRxiv 2023.08.15.553436.
See Also
A plotting method is available for ebnm objects: see
plot.ebnm.
For other methods, see coef.ebnm, confint.ebnm,
fitted.ebnm, logLik.ebnm,
nobs.ebnm, predict.ebnm,
print.ebnm, print.summary.ebnm,
quantile.ebnm, residuals.ebnm,
simulate.ebnm,
summary.ebnm, and vcov.ebnm.
Calling into functions ebnm_point_normal,
ebnm_point_laplace,
ebnm_point_exponential, ebnm_normal,
ebnm_horseshoe,
ebnm_normal_scale_mixture, ebnm_unimodal,
ebnm_unimodal_symmetric,
ebnm_unimodal_nonnegative,
ebnm_unimodal_nonpositive,
ebnm_generalized_binary, ebnm_npmle,
ebnm_deconvolver, ebnm_flat,
ebnm_point_mass, and ebnm_ash
is equivalent to calling into ebnm with prior_family set
accordingly.
Examples
theta <- c(rep(0, 100), rexp(100))
s <- 1
x <- theta + rnorm(200, 0, s)
# The following are equivalent:
pn.res <- ebnm(x, s, prior_family = "point_normal")
pn.res <- ebnm_point_normal(x, s)
# Inspect results:
logLik(pn.res)
plot(pn.res)
# Fix the scale parameter:
pl.res <- ebnm_point_laplace(x, s, scale = 1)
# Estimate the mode:
normal.res <- ebnm_normal(x, s, mode = "estimate")
plot(normal.res) # posterior means shrink to a value different from zero
# Use an initial g (this fixes mode and scale for ash priors):
normalmix.res <- ebnm_normal_scale_mixture(x, s, g_init = pn.res)
# Fix g and get different output (including a posterior sampler):
pn.res <- ebnm_point_normal(x, s, g_init = pn.res, fix_g = TRUE,
output = ebnm_output_all())
# Sample from the posterior:
pn.samp <- simulate(pn.res, nsim = 100)
# Quantiles and HPD confidence intervals can be obtained via sampling:
set.seed(1)
pn.quantiles <- quantile(pn.res, probs = c(0.1, 0.9))
pn.quantiles[1:5, ]
confint(pn.res, level = 0.8, parm = 1:5)
# Examples of usage of control parameter:
# point_normal uses nlm:
pn.res <- ebnm_point_normal(x, s, control = list(print.level = 1))
# unimodal uses mixsqp:
unimodal.res <- ebnm_unimodal(x, s, control = list(verbose = TRUE))