R: Solve the EBNM problem

ebnm {ebnm}

R Documentation

Solve the EBNM problem

Description

Solves the empirical Bayes normal means (EBNM) problem using a specified family of priors. For an article-length introduction to the package, see Willwerscheid and Stephens (2021), cited in References below.

Usage

ebnm(
  x,
  s = 1,
  prior_family = c("point_normal", "point_laplace", "point_exponential", "normal",
    "horseshoe", "normal_scale_mixture", "unimodal", "unimodal_symmetric",
    "unimodal_nonnegative", "unimodal_nonpositive", "generalized_binary", "npmle",
    "deconvolver", "flat", "point_mass", "ash"),
  mode = 0,
  scale = "estimate",
  g_init = NULL,
  fix_g = FALSE,
  output = ebnm_output_default(),
  optmethod = NULL,
  control = NULL,
  ...
)

ebnm_output_default()

ebnm_output_all()

Arguments

`x`	A vector of observations. Missing observations (`NA`s) are not allowed.
`s`	A vector of standard errors (or a scalar if all are equal). Standard errors may not be exactly zero, and missing standard errors are not allowed. Two prior families have additional restrictions: when horseshoe priors are used, errors must be homoskedastic; and since function `deconv` in package `deconvolveR` takes `z`-scores, the "deconvolver" family requires that all standard errors be equal to 1.
`prior_family`	A character string that specifies the prior family `G`. See Details below.
`mode`	A scalar specifying the mode of the prior `g` or `"estimate"` if the mode is to be estimated from the data. This parameter is ignored by the NPMLE, the `deconvolveR` family, and the improper uniform (or "flat") prior. For generalized binary priors, which are bimodal, the mode parameter specifies the mode of the truncated normal component (the location of the point mass is fixed at zero).
`scale`	A scalar or vector specifying the scale parameter(s) of the prior or `"estimate"` if the scale parameters are to be estimated from the data. This parameter is ignored by the flat prior and the family of point mass priors. The interpretation of `scale` depends on the prior family. For normal and point-normal families, it is a scalar specifying the standard deviation of the normal component. For point-Laplace and point-exponential families, it is a scalar specifying the scale parameter of the Laplace or exponential component. For the horseshoe family, it corresponds to `s\tau` in the usual parametrization of the `horseshoe` distribution. For the family of generalized binary priors, it specifies the ratio of the (untruncated) standard deviation of the normal component to its mode. This ratio must be fixed in advance (i.e., argument `"estimate"` is unavailable for generalized binary priors). For the NPMLE and `deconvolveR` prior family, `scale` is a scalar specifying the distance between successive means in the grid of point masses or normal distributions used to estimate `g`. For all other prior families, which are implemented using the function `ash` in package `ashr`, it is a vector specifying the parameter `mixsd` to be passed to `ash` or `"estimate"` if `mixsd` is to be chosen by `ebnm`. (Note that `ebnm` chooses `mixsd` differently from `ash`: see functions `ebnm_scale_normalmix`, `ebnm_scale_unimix`, and `ebnm_scale_npmle` for details. To use the `ash` grid, set `scale = "estimate"` and pass in `gridmult` as an additional parameter. See `ash` for defaults and details.)
`g_init`	The prior distribution `g`. Usually this is left unspecified (`NULL`) and estimated from the data. However, it can be used in conjuction with `fix_g = TRUE` to fix the prior (useful, for example, to do computations with the "true" `g` in simulations). If `g_init` is specified but `fix_g = FALSE`, `g_init` specifies the initial value of `g` used during optimization. For non-parametric priors, this has the side effect of fixing the `mode` and `scale` parameters. If `g_init` is supplied, it should be an object of class `normalmix` for normal, point-normal, scale mixture of normals, and `deconvolveR` prior families, as well as for the NPMLE; class `laplacemix` for point-Laplace families; class `gammamix` for point-exponential families; class `horseshoe` for horseshoe families; class `unimix` for `unimodal_` families; or class `tnormalmix` for generalized binary priors. An object of class `ebnm` can also be supplied as argument, provided that field `fitted_g` contains a prior of the correct class (see Examples below).
`fix_g`	If `TRUE`, fix the prior `g` at `g_init` instead of estimating it.
`output`	A character vector indicating which values are to be returned. Function `ebnm_output_default()` provides the default return values, while `ebnm_output_all()` lists all possible return values. See Value below.
`optmethod`	A string specifying which optimization function is to be used. Since all non-parametric families rely upon external packages, this parameter is only available for parametric families (point-normal, point-Laplace, point-exponential, and normal). Options include `"nlm"`, `"lbfgsb"` (which calls `optim` with `method = "L-BFGS-B"`), and `"trust"` (which calls into package `trust`). Other options are `"nohess_nlm"`, `"nograd_nlm"`, and `"nograd_lbfgsb"`, which use numerical approximations rather than exact expressions for the Hessian and (for the latter two) the gradient. The default option is `"nohess_nlm"`.
`control`	A list of control parameters to be passed to the optimization function. `optimize` is used for normal and horseshoe prior families, while `nlm` is used for parametric families unless parameter `optmethod` specifies otherwise. `nlm` is also used for the `deconvolveR` prior family. For ash families (including scale mixtures of normals, the NPMLE, and all `unimodal_` families), function `mixsqp` in package `mixsqp` is the default. For generalized binary priors, function `optim` is used with `method = "L-BFGS-B"`.
`...`	Additional parameters. When a `unimodal_` prior family is used, these parameters are passed to function `ash` in package `ashr`. Although it does not call into `ashr`, the scale mixture of normals family accepts parameter `gridmult` for purposes of comparison. When `gridmult` is set, an `ashr`-style grid will be used instead of the default `ebnm` grid. When the "deconvolver" family is used, additional parameters are passed to function `deconv` in package `deconvolveR`. Families of generalized binary priors take several additional parameters; see `ebnm_generalized_binary`. In all other cases, additional parameters are ignored.

Details

Given vectors of data x and standard errors s, ebnm solves the "empirical Bayes normal means" (EBNM) problem for various choices of prior family. The model is

x_j | \theta_j, s_j \sim N(\theta_j, s_j^2)

\theta_j \sim g \in G,

where g, which is referred to as the "prior distribution" for \theta, is to be estimated from among some specified family of prior distributions G. Several options for G are implemented, some parametric and others non-parametric; see below for examples.

Solving the EBNM problem involves two steps. First, g \in G is estimated via maximum marginal likelihood:

\hat{g} := \arg\max_{g \in G} L(g),

where

L(g) := \prod_j \int p(x_j | \theta_j, s_j) g(d\theta_j).

Second, posterior distributions p(\theta_j | x_j, s_j, \hat{g}) and/or summaries such as posterior means and posterior second moments are computed.

Implemented prior families include:

point_normal

The family of mixtures where one component is a point mass at \mu and the other is a normal distribution centered at \mu.

point_laplace

The family of mixtures where one component is a point mass at \mu and the other is a double-exponential distribution centered at \mu.

point_exponential

The family of mixtures where one component is a point mass at \mu and the other is a (nonnegative) exponential distribution with mode \mu.

normal

The family of normal distributions.

horseshoe

The family of horseshoe distributions.

normal_scale_mixture

The family of scale mixtures of normals.

unimodal

The family of all unimodal distributions.

unimodal_symmetric

The family of symmetric unimodal distributions.

unimodal_nonnegative

The family of unimodal distributions with support constrained to be greater than the mode.

unimodal_nonpositive

The family of unimodal distributions with support constrained to be less than the mode.

generalized_binary

The family of mixtures where one component is a point mass at zero and the other is a truncated normal distribution with lower bound zero and nonzero mode. See Liu et al. (2023), cited in References below.

npmle

The family of all distributions.

deconvolver

A non-parametric exponential family with a natural spline basis. Like npmle, there is no unimodal assumption, but whereas npmle produces spiky estimates for g, deconvolver estimates are much more regular. See deconvolveR-package for details and references.

flat

The "non-informative" improper uniform prior, which yields posteriors

\theta_j | x_j, s_j \sim N(x_j, s_j^2).

point_mass

The family of point masses \delta_\mu. Posteriors are point masses at \mu.

ash

Calls into function ash in package ashr. Can be used to make direct comparisons of ebnm and ashr implementations of prior families such as scale mixtures of normals and the NPMLE.

Value

An ebnm object. Depending on the argument to output, the object is a list containing elements:

data: A data frame containing the observations x and standard errors s.
posterior: A data frame of summary results (posterior means, standard deviations, second moments, and local false sign rates).
fitted_g: The fitted prior \hat{g} (an object of class normalmix, laplacemix, gammamix, unimix, tnormalmix, or horseshoe).
log_likelihood: The optimal log likelihood attained, L(\hat{g}).
posterior_sampler: A function that can be used to produce samples from the posterior. For all prior families other than the horseshoe, the sampler takes a single parameter nsamp, the number of posterior samples to return per observation. Since ebnm_horseshoe returns an MCMC sampler, it additionally takes parameter burn, the number of burn-in samples to discard.

S3 methods coef, confint, fitted, logLik, nobs, plot, predict, print, quantile, residuals, simulate, summary, and vcov have been implemented for ebnm objects. For details, see the respective help pages, linked below under See Also.

Functions

ebnm_output_default(): Lists the default return values.
ebnm_output_all(): Lists all valid return values.

References

Jason Willwerscheid and Matthew Stephens (2021). ebnm: an R Package for solving the empirical Bayes normal means problem using a variety of prior families. arXiv:2110.00152.

Yusha Liu, Peter Carbonetto, Jason Willwerscheid, Scott A Oakes, Kay F Macleod, and Matthew Stephens (2023). Dissecting tumor transcriptional heterogeneity from single-cell RNA-seq data by generalized binary covariance decomposition. bioRxiv 2023.08.15.553436.

Examples

theta <- c(rep(0, 100), rexp(100))
s <- 1
x <- theta + rnorm(200, 0, s)

# The following are equivalent:
pn.res <- ebnm(x, s, prior_family = "point_normal")
pn.res <- ebnm_point_normal(x, s)

# Inspect results:
logLik(pn.res)
plot(pn.res)

# Fix the scale parameter:
pl.res <- ebnm_point_laplace(x, s, scale = 1)

# Estimate the mode:
normal.res <- ebnm_normal(x, s, mode = "estimate")
plot(normal.res) # posterior means shrink to a value different from zero

# Use an initial g (this fixes mode and scale for ash priors):
normalmix.res <- ebnm_normal_scale_mixture(x, s, g_init = pn.res)

# Fix g and get different output (including a posterior sampler):
pn.res <- ebnm_point_normal(x, s, g_init = pn.res, fix_g = TRUE,
                            output = ebnm_output_all())

# Sample from the posterior:
pn.samp <- simulate(pn.res, nsim = 100)

# Quantiles and HPD confidence intervals can be obtained via sampling:
set.seed(1)
pn.quantiles <- quantile(pn.res, probs = c(0.1, 0.9))
pn.quantiles[1:5, ]
confint(pn.res, level = 0.8, parm = 1:5)

# Examples of usage of control parameter:
#  point_normal uses nlm:
pn.res <- ebnm_point_normal(x, s, control = list(print.level = 1))
#  unimodal uses mixsqp:
unimodal.res <- ebnm_unimodal(x, s, control = list(verbose = TRUE))

[Package ebnm version 1.1-2 Index]