R: Greedily add factors to a flash object

flash_greedy {flashier}

R Documentation

Greedily add factors to a flash object

Description

Adds factor/loadings pairs to a flash object in a "greedy" manner. Up to Kmax pairs are added one at a time. At each step, flash_greedy attempts to find an optimal additional (rank-one) factor given all previously added factors. The additional factor is retained if it increases the variational lower bound (ELBO); otherwise, fitting terminates. See flash for examples of usage.

Usage

flash_greedy(
  flash,
  Kmax = 1,
  ebnm_fn = ebnm_point_normal,
  init_fn = NULL,
  extrapolate = FALSE,
  warmstart = FALSE,
  maxiter = 500,
  tol = NULL,
  verbose = NULL
)

Arguments

`flash`	A `flash` or `flash_fit` object to which factors are to be added.
`Kmax`	The maximum number of factors to be added. This will not necessarily be the total number of factors added by `flash_greedy`, since factors are only added as long as they increase the ELBO.
`ebnm_fn`	The function or functions used to solve the empirical Bayes normal means (EBNM) subproblems. Most importantly, these functions specify the families of distributions `G_\ell^{(k)}` and `G_f^{(k)}` to which the priors on loadings and factors `g_\ell^{(k)}` and `g_f^{(k)}` are assumed to belong. If the same function is to be used for both loadings `L` and factors `F`, then `ebnm_fn` can be a single function. If one function is to be used for loadings and a second for factors, then `ebnm_fn` should be a list of length two, with the first element giving the function for loadings and the second the function for factors. If different functions are to be used for different values of `k`, then factor/loadings pairs must be added successively using multiple calls to either `flash_greedy` or `flash_factors_init`. Any EBNM function provided by package `ebnm` can be used as input. Non-default arguments to parameters can be supplied using the helper function `flash_ebnm`. Custom EBNM functions can also be used: for details, see `flash_ebnm`.
`init_fn`	The function used to initialize factor/loadings pairs. Functions `flash_greedy_init_default`, `flash_greedy_init_softImpute`, and `flash_greedy_init_irlba` have been supplied; note, in particular, that `flash_greedy_init_softImpute` can yield better results than the default initialization function when there is missing data. Custom initialization functions may also be used. If `init_fn = NULL` then `flash_greedy_init_default` will be used, with an attempt made to set argument `sign_constraints` appropriately via test calls to the EBNM function(s) specified by parameter `ebnm_fn`. If factors or loadings are constrained in some other fashion (e.g., bounded support), then the initialization function should be modified to account for the constraints — otherwise, the greedy algorithm can stop adding factor/loadings pairs too early. Custom initialization functions should accept a single parameter referring to a `flash_fit` object and should output a list consisting of two vectors, which will be used as initial values for the new loadings `\ell_{\cdot k}` and the new factor `f_{\cdot k}`. Typically, a custom initialization function will extract the matrix of residuals from the `flash_fit` object using method `residuals.flash_fit` and then return a (possibly constrained) rank-one approximation to the matrix of residuals. See Examples below.
`extrapolate`	Whether to use an extrapolation technique inspired by Ang and Gillis (2019) to accelerate the fitting process. Control parameters are handled via global options and can be set by calling `options("extrapolate.control") <- control.param`.
`warmstart`	Whether to use "warmstarts" when solving the EBNM subproblems by initializing solutions at the previous value of the fitted prior `\hat{g}`. An important side effect of warmstarts for `ashr`-like prior families is to fix the grid at its initial setting. Fixing the grid can lead to poor fits if there are large changes in the scale of the estimated prior over the course of the fitting process. However, allowing the grid to vary can occasionally result in decreases in ELBO.
`maxiter`	The maximum number of iterations when optimizing a greedily added factor/loadings pair.
`tol`	The convergence tolerance parameter. At each iteration, the fit is compared to the fit from the previous iteration using a convergence criterion function (by default, the difference in ELBO, but the criterion can be changed via `flash_set_conv_crit`). When the value returned by this function is less than or equal to `tol`, the newly added factor/loadings pair is considered to have "converged," so that `flash_greedy` moves on and attempts to add another new factor (or, if the maximum number of factors `Kmax` has been reached, the process terminates). Note that specifying `tol` here will override any value set by `flash_set_conv_crit`; to use the "global" tolerance parameter, `tol` must be left unspecified (`NULL`). If `tol = NULL` and a global tolerance parameter has not been set, then the default tolerance used is `np\sqrt{\epsilon}`, where `n` is the number of rows in the dataset, `p` is the number of columns, and `\epsilon` is equal to `.Machine$double.eps`.
`verbose`	When and how to display progress updates. Set to `0` for none, `1` for updates after a factor is added or a backfit is completed, `2` for additional notifications about the variational lower bound, and `3` for updates after every iteration. It is also possible to output a single tab-delimited table of values using function `flash_set_verbose` with `verbose = -1`.

Value

The flash object from argument flash, with up to Kmax new factor/loadings pairs "greedily" added.

Examples

# The following are examples of advanced usage. See ?flash for basic usage.

# Increase the maximum number of iterations in the default initialization
#   method.
my_init_fn <- function(f) flash_greedy_init_default(f, maxiter = 500)
fl <- flash_init(gtex) %>%
  flash_greedy(init_fn = my_init_fn)

# Use a custom initialization function that wraps function nmf from
#   package RcppML.
nmf_init_fn <- function(f) {
  nmf_res <- RcppML::nmf(resid(f), k = 1, verbose = FALSE)
  return(list(as.vector(nmf_res$w), as.vector(nmf_res$h)))
}
fl.nmf <- flash_init(gtex) %>%
  flash_greedy(ebnm_fn = ebnm_unimodal_nonnegative,
               init_fn = nmf_init_fn)