flash {flashier} | R Documentation |
Empirical Bayes matrix factorization
Description
Fits an empirical Bayes matrix factorization (see Details for a
description of the model). The resulting fit is referred to as a "flash"
object (short for Factors and Loadings using Adaptive SHrinkage). Two
interfaces are provided. The flash
function provides a simple
interface that allows a flash object to be fit in a single pass, while
flash_xxx
functions are pipeable functions that allow for more
complex flash objects to be fit incrementally (available functions are
listed below under See Also). See the vignettes and
Examples for usage.
Usage
flash(
data,
S = NULL,
ebnm_fn = ebnm_point_normal,
var_type = 0L,
greedy_Kmax = 50L,
backfit = FALSE,
nullcheck = TRUE,
verbose = 1L
)
Arguments
data |
The observations. Usually a matrix, but can also be a sparse
matrix of class |
S |
The standard errors. Can be |
ebnm_fn |
The function or functions used to solve the empirical Bayes
normal means (EBNM) subproblems. Most importantly, these functions specify
the families of distributions Any EBNM function provided by package |
var_type |
Describes the structure of the estimated residual variance.
Can be Note that if any portion of the residual variance is to be estimated, then
it is usually faster to set |
greedy_Kmax |
The maximum number of factors to be added. This will not
necessarily be the total number of factors added by |
backfit |
A "greedy" fit is performed by adding up to
|
nullcheck |
If |
verbose |
When and how to display progress updates. Set to
|
Details
If is an
data matrix, then the rank-one
empirical Bayes matrix factorization model is:
where is an
-vector of loadings,
is a
-vector of factors, and
is an
matrix of residuals (or "errors").
Additionally:
The residual variance parameters are constrained to have
a simple structure and are fit via maximum likelihood. (For example, one
might assume that all standard errors are identical:
for some
and for all
,
).
The functions
and
are assumed to belong to
some families of priors
and
that are
specified in advance, and are estimated via variational approximation.
The general rank- empirical Bayes matrix factorization model is:
or
where is now a matrix of loadings and
is a matrix of
factors.
Separate priors and
are estimated via
empirical Bayes, and different prior families may be used for different
values of
. In general, then:
Typically, and
will be closed under
scaling, in which case
and
are only identifiable
up to a scaling factor
. In other words, we can write:
where is a diagonal matrix with diagonal entries
.
The model can then be made identifiable by constraining the scale of
and
for
.
Value
A flash
object. Contains elements:
n_factors
The total number of factor/loadings pairs
in the fitted model.
pve
The proportion of variance explained by each factor/loadings pair. Since factors and loadings are not required to be orthogonal, this should be interpreted loosely: for example, the total proportion of variance explained could be larger than 1.
elbo
The variational lower bound achieved by the fitted model.
residuals_sd
Estimated residual standard deviations (these include any variance component given as an argument to
S
).L_pm, L_psd, L_lfsr
Posterior means, standard deviations, and local false sign rates for loadings
.
F_pm, F_psd, F_lfsr
Posterior means, standard deviations, and local false sign rates for factors
.
L_ghat
The fitted priors on loadings
.
F_ghat
The fitted priors on factors
.
sampler
A function that takes a single argument
nsamp
and returnsnsamp
samples from the posterior distributions for factorsand loadings
.
flash_fit
A
flash_fit
object. Used byflash
when fitting is not performed all at once, but incrementally via calls to variousflash_xxx
functions.
The following methods are available:
fitted.flash
Returns the "fitted values"
.
residuals.flash
Returns the expected residuals
.
ldf.flash
Returns an
decomposition (see Details above), with columns of
and
scaled as specified by the user.
References
Wei Wang and Matthew Stephens (2021). "Empirical Bayes matrix factorization." Journal of Machine Learning Research 22, 1–40.
See Also
flash_init
, flash_greedy
,
flash_backfit
, and flash_nullcheck
. For more
advanced functionality, see flash_factors_init
,
flash_factors_fix
, flash_factors_set_to_zero
,
flash_factors_remove
, flash_set_verbose
, and
flash_set_conv_crit
.
For extracting useful data from flash
objects, see
fitted.flash
, residuals.flash
, and
ldf.flash
.
Examples
data(gtex)
# Fit up to 3 factors and backfit.
fl <- flash(gtex, greedy_Kmax = 3L, backfit = TRUE)
# This is equivalent to the series of calls:
fl <- flash_init(gtex) %>%
flash_greedy(Kmax = 3L) %>%
flash_backfit() %>%
flash_nullcheck()
# Fit a unimodal distribution with mean zero to each set of loadings
# and a scale mixture of normals with mean zero to each factor.
fl <- flash(gtex,
ebnm_fn = c(ebnm_unimodal,
ebnm_normal_scale_mixture),
greedy_Kmax = 3)
# Fit point-laplace priors using a non-default optimization method.
fl <- flash(gtex,
ebnm_fn = flash_ebnm(prior_family = "point_laplace",
optmethod = "trust"),
greedy_Kmax = 3)
# Fit a "Kronecker" (rank-one) variance structure (this can be slow).
fl <- flash(gtex, var_type = c(1, 2), greedy_Kmax = 3L)