R: Empirical Bayes matrix factorization

flash {flashier}

R Documentation

Empirical Bayes matrix factorization

Description

Fits an empirical Bayes matrix factorization (see Details for a description of the model). The resulting fit is referred to as a "flash" object (short for Factors and Loadings using Adaptive SHrinkage). Two interfaces are provided. The flash function provides a simple interface that allows a flash object to be fit in a single pass, while flash_xxx functions are pipeable functions that allow for more complex flash objects to be fit incrementally (available functions are listed below under See Also). See the vignettes and Examples for usage.

Usage

flash(
  data,
  S = NULL,
  ebnm_fn = ebnm_point_normal,
  var_type = 0L,
  greedy_Kmax = 50L,
  backfit = FALSE,
  nullcheck = TRUE,
  verbose = 1L
)

Arguments

`data`	The observations. Usually a matrix, but can also be a sparse matrix of class `Matrix` or a low-rank matrix representation as returned by, for example, `svd`, `irlba`, `rsvd`, or `softImpute` (in general, any list that includes fields `u`, `d`, and `v` will be interpreted as a low-rank matrix representation).
`S`	The standard errors. Can be `NULL` (in which case all residual variance will be estimated) or a matrix, vector, or scalar. `S` should be a scalar if standard errors are identical across observations. It should be a vector if standard errors either vary across columns but are constant within any given row, or vary across rows but are constant within any given column (`flash` will use the length of the vector to determine whether the supplied values correspond to rows or columns; if the data matrix is square, then the sense must be specified using parameter `S_dim` in function `flash_init`).
`ebnm_fn`	The function or functions used to solve the empirical Bayes normal means (EBNM) subproblems. Most importantly, these functions specify the families of distributions `G_\ell^{(k)}` and `G_f^{(k)}` to which the priors on loadings and factors `g_\ell^{(k)}` and `g_f^{(k)}` are assumed to belong. If the same function is to be used for both loadings `L` and factors `F`, then `ebnm_fn` can be a single function. If one function is to be used for loadings and a second for factors, then `ebnm_fn` should be a list of length two, with the first element giving the function for loadings and the second the function for factors. If different functions are to be used for different values of `k`, then factor/loadings pairs must be added successively using multiple calls to either `flash_greedy` or `flash_factors_init`. Any EBNM function provided by package `ebnm` can be used as input. Non-default arguments to parameters can be supplied using the helper function `flash_ebnm`. Custom EBNM functions can also be used: for details, see `flash_ebnm`.
`var_type`	Describes the structure of the estimated residual variance. Can be `NULL`, `0`, `1`, `2`, or `c(1, 2)`. If `NULL`, then `S` accounts for all residual variance. If `var_type = 0`, then the estimated residual variance (which is added to any variance given by `S`) is assumed to be constant across all observations. Setting `var_type = 1` estimates a single variance parameter for each row; `var_type = 2` estimates one parameter for each column; and `var_type = c(1, 2)` optimizes over all rank-one matrices (that is, it assumes that the residual variance parameter `s_{ij}` can be written `s_{ij} = a_i b_j`, where the `n`-vector `a` and the `p`-vector `b` are to be estimated). Note that if any portion of the residual variance is to be estimated, then it is usually faster to set `S = NULL` and to let `flash` estimate all of the residual variance. Further, `var_type = c(1, 2)` is typically much slower than other options, so it should be used with care.
`greedy_Kmax`	The maximum number of factors to be added. This will not necessarily be the total number of factors added by `flash`, since factors are only added as long as they increase the variational lower bound on the log likelihood for the model.
`backfit`	A "greedy" fit is performed by adding up to `greedy_Kmax` factors, optimizing each newly added factor in one go without returning to optimize previously added factors. When `backfit = TRUE`, `flash` will additionally perform a final "backfit" where all factors are cyclically updated until convergence. The backfitting procedure typically takes much longer than the greedy algorithm, but it also usually improves the final fit to a significant degree.
`nullcheck`	If `nullcheck = TRUE`, then `flash` will check that each factor in the final flash object improves the overall fit. Any factor that fails the check will be removed.
`verbose`	When and how to display progress updates. Set to `0` for none, `1` for updates after a factor is added or a backfit is completed, `2` for additional notifications about the variational lower bound, and `3` for updates after every iteration. It is also possible to output a single tab-delimited table of values using function `flash_set_verbose` with `verbose = -1`.

Details

If Y is an n \times p data matrix, then the rank-one empirical Bayes matrix factorization model is:

Y = \ell f' + E,

where \ell is an n-vector of loadings, f is a p-vector of factors, and E is an n \times p matrix of residuals (or "errors"). Additionally:

e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p

\ell \sim g_\ell \in G_\ell

f \sim g_f \in G_f.

The residual variance parameters s_{ij}^2 are constrained to have a simple structure and are fit via maximum likelihood. (For example, one might assume that all standard errors are identical: s_{ij}^2 = s^2 for some s^2 and for all i, j). The functions g_\ell and g_f are assumed to belong to some families of priors G_\ell and G_f that are specified in advance, and are estimated via variational approximation.

The general rank-K empirical Bayes matrix factorization model is:

Y = LF' + E

y_{ij} = \sum_k \ell_{ik} f_{jk} + e_{ij}: i = 1, ..., n; j = 1, ..., p,

where L is now a matrix of loadings and F is a matrix of factors.

Separate priors g_\ell^{(k)} and g_f^{(k)} are estimated via empirical Bayes, and different prior families may be used for different values of k. In general, then:

e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p

\ell_{ik} \sim g_\ell^{(k)} \in G_\ell^{(k)}: i = 1, ..., n; k = 1, ..., K

f_{ik} \sim g_f^{(k)} \in G_f^{(k)}: j = 1, ..., p; k = 1, ..., K.

Typically, G_\ell^{(k)} and G_f^{(k)} will be closed under scaling, in which case \ell_k and f_k are only identifiable up to a scaling factor d_k. In other words, we can write:

Y = LDF' + E,

where D is a diagonal matrix with diagonal entries d_1, ..., d_K. The model can then be made identifiable by constraining the scale of \ell_k and f_k for k = 1, ..., K.

Value

A flash object. Contains elements:

n_factors: The total number of factor/loadings pairs K in the fitted model.
pve: The proportion of variance explained by each factor/loadings pair. Since factors and loadings are not required to be orthogonal, this should be interpreted loosely: for example, the total proportion of variance explained could be larger than 1.
elbo: The variational lower bound achieved by the fitted model.
residuals_sd: Estimated residual standard deviations (these include any variance component given as an argument to S).
L_pm, L_psd, L_lfsr: Posterior means, standard deviations, and local false sign rates for loadings L.
F_pm, F_psd, F_lfsr: Posterior means, standard deviations, and local false sign rates for factors F.
L_ghat: The fitted priors on loadings \hat{g}_\ell^{(k)}.
F_ghat: The fitted priors on factors \hat{g}_f^{(k)}.
sampler: A function that takes a single argument nsamp and returns nsamp samples from the posterior distributions for factors F and loadings L.
flash_fit: A flash_fit object. Used by flash when fitting is not performed all at once, but incrementally via calls to various flash_xxx functions.

The following methods are available:

fitted.flash: Returns the "fitted values" E(LF') = E(L) E(F)'.
residuals.flash: Returns the expected residuals Y - E(LF') = Y - E(L) E(F)'.
ldf.flash: Returns an LDF decomposition (see Details above), with columns of L and F scaled as specified by the user.

References

Wei Wang and Matthew Stephens (2021). "Empirical Bayes matrix factorization." Journal of Machine Learning Research 22, 1–40.

Examples

data(gtex)

# Fit up to 3 factors and backfit.
fl <- flash(gtex, greedy_Kmax = 3L, backfit = TRUE)

# This is equivalent to the series of calls:
fl <- flash_init(gtex) %>%
  flash_greedy(Kmax = 3L) %>%
  flash_backfit() %>%
  flash_nullcheck()

# Fit a unimodal distribution with mean zero to each set of loadings
#   and a scale mixture of normals with mean zero to each factor.
fl <- flash(gtex,
            ebnm_fn = c(ebnm_unimodal,
                        ebnm_normal_scale_mixture),
            greedy_Kmax = 3)

# Fit point-laplace priors using a non-default optimization method.
fl <- flash(gtex,
            ebnm_fn = flash_ebnm(prior_family = "point_laplace",
                                 optmethod = "trust"),
            greedy_Kmax = 3)

# Fit a "Kronecker" (rank-one) variance structure (this can be slow).
fl <- flash(gtex, var_type = c(1, 2), greedy_Kmax = 3L)

[Package flashier version 1.0.7 Index]