R: Perform iNMF on scaled datasets

runINMF {rliger}

R Documentation

Perform iNMF on scaled datasets

Description

Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch, 2019) using block coordinate descent (alternating non-negative least squares, ANLS) to return factorized H, W, and V matrices. The objective function is stated as

\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+ \lambda\sum_{i}^{d}||V_iH_i||_F^2

where E_i is the input non-negative matrix of the i'th dataset, d is the total number of datasets. E_i is of size m \times n_i for m variable genes and n_i cells, H_i is of size n_i \times k, V_i is of size m \times k, and W is of size m \times k.

The factorization produces a shared W matrix (genes by k), and for each dataset, an H matrix (k by cells) and a V matrix (genes by k). The H matrices represent the cell factor loadings. W is held consistent among all datasets, as it represents the shared components of the metagenes across datasets. The V matrices represent the dataset-specific components of the metagenes.

This function adopts highly optimized fast and memory efficient implementation extended from Planc (Kannan, 2016). Pre-installation of extension package RcppPlanc is required. The underlying algorithm adopts the identical ANLS strategy as optimizeALS in the old version of LIGER.

Usage

runINMF(object, k = 20, lambda = 5, ...)

## S3 method for class 'liger'
runINMF(
  object,
  k = 20,
  lambda = 5,
  nIteration = 30,
  nRandomStarts = 1,
  HInit = NULL,
  WInit = NULL,
  VInit = NULL,
  seed = 1,
  nCores = 2L,
  verbose = getOption("ligerVerbose", TRUE),
  ...
)

## S3 method for class 'Seurat'
runINMF(
  object,
  k = 20,
  lambda = 5,
  datasetVar = "orig.ident",
  layer = "ligerScaleData",
  assay = NULL,
  reduction = "inmf",
  nIteration = 30,
  nRandomStarts = 1,
  HInit = NULL,
  WInit = NULL,
  VInit = NULL,
  seed = 1,
  nCores = 2L,
  verbose = getOption("ligerVerbose", TRUE),
  ...
)

Arguments

`object`	A liger object or a Seurat object with non-negative scaled data of variable features (Done with `scaleNotCenter`).
`k`	Inner dimension of factorization (number of factors). Generally, a higher `k` will be needed for datasets with more sub-structure. Default `20`.
`lambda`	Regularization parameter. Larger values penalize dataset-specific effects more strongly (i.e. alignment should increase as `lambda` increases). Default `5`.
`...`	Arguments passed to methods.
`nIteration`	Total number of block coordinate descent iterations to perform. Default `30`.
`nRandomStarts`	Number of restarts to perform (iNMF objective function is non-convex, so taking the best objective from multiple successive initialization is recommended). For easier reproducibility, this increments the random seed by 1 for each consecutive restart, so future factorization of the same dataset can be run with one rep if necessary. Default `1`.
`HInit`	Initial values to use for `H` matrices. A list object where each element is the initial `H` matrix of each dataset. Default `NULL`.
`WInit`	Initial values to use for `W` matrix. A matrix object. Default `NULL`.
`VInit`	Initial values to use for `V` matrices. A list object where each element is the initial `V` matrix of each dataset. Default `NULL`.
`seed`	Random seed to allow reproducible results. Default `1`.
`nCores`	The number of parallel tasks to speed up the computation. Default `2L`. Only supported for platform with OpenMP support.
`verbose`	Logical. Whether to show information of the progress. Default `getOption("ligerVerbose")` or `TRUE` if users have not set.
`datasetVar`	Metadata variable name that stores the dataset source annotation. Default `"orig.ident"`.
`layer`	For Seurat>=4.9.9, the name of layer to retrieve input non-negative scaled data. Default `"ligerScaleData"`. For older Seurat, always retrieve from `scale.data` slot.
`assay`	Name of assay to use. Default `NULL` uses current active assay.
`reduction`	Name of the reduction to store result. Also used as the feature key. Default `"inmf"`.

Value

liger method - Returns updated input liger object
- A list of all H matrices can be accessed with getMatrix(object, "H")
- A list of all V matrices can be accessed with getMatrix(object, "V")
- The W matrix can be accessed with getMatrix(object, "W")
Seurat method - Returns updated input Seurat object
- H matrices for all datasets will be concatenated and transposed (all cells by k), and form a DimReduc object in the reductions slot named by argument reduction.
- W matrix will be presented as feature.loadings in the same DimReduc object.
- V matrices, an objective error value and the dataset variable used for the factorization is currently stored in misc slot of the same DimReduc object.

Difference from optimizeALS()

In the old version implementation, we compute the objective error at the end of each iteration, and then compares if the algorithm is reaching a convergence, using an argument thresh. Now, since the computation of objective error is indeed expensive, we canceled this feature and directly runs a default of 30 (nIteration) iterations, which empirically leads to a convergence most of the time. Given that the new version is highly optimized, running this many iteration should be acceptable.

References

Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019

Examples

pbmc <- normalize(pbmc)
pbmc <- selectGenes(pbmc)
pbmc <- scaleNotCenter(pbmc)
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
    pbmc <- runINMF(pbmc)
}

[Package rliger version 2.0.1 Index]