ebnm {ebnm} | R Documentation |
Solve the EBNM problem
Description
Solves the empirical Bayes normal means (EBNM) problem using a specified family of priors. For an article-length introduction to the package, see Willwerscheid and Stephens (2021), cited in References below.
Usage
ebnm(
x,
s = 1,
prior_family = c("point_normal", "point_laplace", "point_exponential", "normal",
"horseshoe", "normal_scale_mixture", "unimodal", "unimodal_symmetric",
"unimodal_nonnegative", "unimodal_nonpositive", "generalized_binary", "npmle",
"deconvolver", "flat", "point_mass", "ash"),
mode = 0,
scale = "estimate",
g_init = NULL,
fix_g = FALSE,
output = ebnm_output_default(),
optmethod = NULL,
control = NULL,
...
)
ebnm_output_default()
ebnm_output_all()
Arguments
x |
A vector of observations. Missing observations ( |
s |
A vector of standard errors (or a scalar if all are equal).
Standard errors may not be exactly zero, and
missing standard errors are not allowed. Two prior families have
additional restrictions: when horseshoe priors are used, errors
must be homoskedastic; and since function
|
prior_family |
A character string that specifies the prior family
|
mode |
A scalar specifying the mode of the prior |
scale |
A scalar or vector specifying the scale parameter(s) of the
prior or The interpretation of |
g_init |
The prior distribution |
fix_g |
If |
output |
A character vector indicating which values are to be returned.
Function |
optmethod |
A string specifying which optimization function is to be
used. Since all non-parametric families rely upon external packages, this
parameter is only available for parametric families (point-normal,
point-Laplace, point-exponential, and normal). Options include |
control |
A list of control parameters to be passed to the optimization
function. |
... |
Additional parameters. When a |
Details
Given vectors of data x
and standard errors s
, ebnm
solves the "empirical Bayes normal means" (EBNM) problem for various
choices of prior family.
The model is
x_j | \theta_j, s_j \sim N(\theta_j, s_j^2)
\theta_j \sim g \in G,
where g
, which is referred to as the
"prior distribution" for \theta
, is to be estimated from among
some specified family of prior distributions G
. Several options
for G
are implemented, some parametric and others non-parametric;
see below for examples.
Solving the EBNM problem involves
two steps. First, g \in G
is estimated via maximum marginal likelihood:
\hat{g} := \arg\max_{g \in G} L(g),
where
L(g) := \prod_j \int p(x_j | \theta_j, s_j) g(d\theta_j).
Second, posterior distributions
p(\theta_j | x_j, s_j, \hat{g})
and/or summaries
such as posterior means and posterior second moments are computed.
Implemented prior families include:
point_normal
The family of mixtures where one component is a point mass at
\mu
and the other is a normal distribution centered at\mu
.point_laplace
The family of mixtures where one component is a point mass at
\mu
and the other is a double-exponential distribution centered at\mu
.point_exponential
The family of mixtures where one component is a point mass at
\mu
and the other is a (nonnegative) exponential distribution with mode\mu
.normal
The family of normal distributions.
horseshoe
The family of horseshoe distributions.
normal_scale_mixture
The family of scale mixtures of normals.
unimodal
The family of all unimodal distributions.
unimodal_symmetric
The family of symmetric unimodal distributions.
unimodal_nonnegative
The family of unimodal distributions with support constrained to be greater than the mode.
unimodal_nonpositive
The family of unimodal distributions with support constrained to be less than the mode.
generalized_binary
The family of mixtures where one component is a point mass at zero and the other is a truncated normal distribution with lower bound zero and nonzero mode. See Liu et al. (2023), cited in References below.
npmle
The family of all distributions.
deconvolver
A non-parametric exponential family with a natural spline basis. Like
npmle
, there is no unimodal assumption, but whereasnpmle
produces spiky estimates forg
,deconvolver
estimates are much more regular. SeedeconvolveR-package
for details and references.flat
The "non-informative" improper uniform prior, which yields posteriors
\theta_j | x_j, s_j \sim N(x_j, s_j^2).
point_mass
The family of point masses
\delta_\mu
. Posteriors are point masses at\mu
.ash
Calls into function
ash
in packageashr
. Can be used to make direct comparisons ofebnm
andashr
implementations of prior families such as scale mixtures of normals and the NPMLE.
Value
An ebnm
object. Depending on the argument to output
, the
object is a list containing elements:
data
A data frame containing the observations
x
and standard errorss
.posterior
A data frame of summary results (posterior means, standard deviations, second moments, and local false sign rates).
fitted_g
The fitted prior
\hat{g}
(an object of classnormalmix
,laplacemix
,gammamix
,unimix
,tnormalmix
, orhorseshoe
).log_likelihood
The optimal log likelihood attained,
L(\hat{g})
.posterior_sampler
A function that can be used to produce samples from the posterior. For all prior families other than the horseshoe, the sampler takes a single parameter
nsamp
, the number of posterior samples to return per observation. Sinceebnm_horseshoe
returns an MCMC sampler, it additionally takes parameterburn
, the number of burn-in samples to discard.
S3 methods coef
, confint
, fitted
, logLik
,
nobs
, plot
, predict
, print
, quantile
,
residuals
, simulate
, summary
, and vcov
have been implemented for ebnm
objects. For details, see the
respective help pages, linked below under See Also.
Functions
-
ebnm_output_default()
: Lists the default return values. -
ebnm_output_all()
: Lists all valid return values.
References
Jason Willwerscheid and Matthew Stephens (2021).
ebnm
: an R
Package for solving the empirical Bayes
normal means problem using a variety of prior families. arXiv:2110.00152.
Yusha Liu, Peter Carbonetto, Jason Willwerscheid, Scott A Oakes, Kay F Macleod, and Matthew Stephens (2023). Dissecting tumor transcriptional heterogeneity from single-cell RNA-seq data by generalized binary covariance decomposition. bioRxiv 2023.08.15.553436.
See Also
A plotting method is available for ebnm
objects: see
plot.ebnm
.
For other methods, see coef.ebnm
, confint.ebnm
,
fitted.ebnm
, logLik.ebnm
,
nobs.ebnm
, predict.ebnm
,
print.ebnm
, print.summary.ebnm
,
quantile.ebnm
, residuals.ebnm
,
simulate.ebnm
,
summary.ebnm
, and vcov.ebnm
.
Calling into functions ebnm_point_normal
,
ebnm_point_laplace
,
ebnm_point_exponential
, ebnm_normal
,
ebnm_horseshoe
,
ebnm_normal_scale_mixture
, ebnm_unimodal
,
ebnm_unimodal_symmetric
,
ebnm_unimodal_nonnegative
,
ebnm_unimodal_nonpositive
,
ebnm_generalized_binary
, ebnm_npmle
,
ebnm_deconvolver
, ebnm_flat
,
ebnm_point_mass
, and ebnm_ash
is equivalent to calling into ebnm
with prior_family
set
accordingly.
Examples
theta <- c(rep(0, 100), rexp(100))
s <- 1
x <- theta + rnorm(200, 0, s)
# The following are equivalent:
pn.res <- ebnm(x, s, prior_family = "point_normal")
pn.res <- ebnm_point_normal(x, s)
# Inspect results:
logLik(pn.res)
plot(pn.res)
# Fix the scale parameter:
pl.res <- ebnm_point_laplace(x, s, scale = 1)
# Estimate the mode:
normal.res <- ebnm_normal(x, s, mode = "estimate")
plot(normal.res) # posterior means shrink to a value different from zero
# Use an initial g (this fixes mode and scale for ash priors):
normalmix.res <- ebnm_normal_scale_mixture(x, s, g_init = pn.res)
# Fix g and get different output (including a posterior sampler):
pn.res <- ebnm_point_normal(x, s, g_init = pn.res, fix_g = TRUE,
output = ebnm_output_all())
# Sample from the posterior:
pn.samp <- simulate(pn.res, nsim = 100)
# Quantiles and HPD confidence intervals can be obtained via sampling:
set.seed(1)
pn.quantiles <- quantile(pn.res, probs = c(0.1, 0.9))
pn.quantiles[1:5, ]
confint(pn.res, level = 0.8, parm = 1:5)
# Examples of usage of control parameter:
# point_normal uses nlm:
pn.res <- ebnm_point_normal(x, s, control = list(print.level = 1))
# unimodal uses mixsqp:
unimodal.res <- ebnm_unimodal(x, s, control = list(verbose = TRUE))