runUINMF {rliger}R Documentation

Perform Mosaic iNMF (UINMF) on scaled datasets with unshared features

Description

Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R. Kriebel, 2022) using block coordinate descent (alternating non-negative least squares, ANLS) to return factorized HH, WW, VV and UU matrices. The objective function is stated as

argminH0,W0,V0,U0id[EiPi]([W0]+[ViUi])HiF2+λiid[ViUi]HiF2\arg\min_{H\ge0,W\ge0,V\ge0,U\ge0}\sum_{i}^{d} ||\begin{bmatrix}E_i \\ P_i \end{bmatrix} - (\begin{bmatrix}W \\ 0 \end{bmatrix}+ \begin{bmatrix}V_i \\ U_i \end{bmatrix})Hi||^2_F+ \lambda_i\sum_{i}^{d}||\begin{bmatrix}V_i \\ U_i \end{bmatrix}H_i||_F^2

where EiE_i is the input non-negative matrix of the ii'th dataset, PiP_i is the input non-negative matrix for the unshared features, dd is the total number of datasets. EiE_i is of size m×nim \times n_i for mm shared features and nin_i cells, PiP_i is of size ui×niu_i \times n_i for uiu_i unshared feaetures, HiH_i is of size k×nik \times n_i, ViV_i is of size m×km \times k, WW is of size m×km \times k and UiU_i is of size ui×ku_i \times k.

The factorization produces a shared WW matrix (genes by k). For each dataset, an HH matrix (k by cells), a VV matrix (genes by k) and a UU matrix (unshared genes by k). The HH matrices represent the cell factor loadings. WW is held consistent among all datasets, as it represents the shared components of the metagenes across datasets. The VV matrices represent the dataset-specific components of the metagenes, UU matrices are similar to VVs but represents the loading contributed by unshared features.

This function adopts highly optimized fast and memory efficient implementation extended from Planc (Kannan, 2016). Pre-installation of extension package RcppPlanc is required. The underlying algorithm adopts the identical ANLS strategy as optimizeALS(unshared = TRUE) in the old version of LIGER.

Usage

runUINMF(object, k = 20, lambda = 5, ...)

## S3 method for class 'liger'
runUINMF(
  object,
  k = 20,
  lambda = 5,
  nIteration = 30,
  nRandomStarts = 1,
  seed = 1,
  nCores = 2L,
  verbose = getOption("ligerVerbose", TRUE),
  ...
)

Arguments

object

liger object. Should run selectGenes with unshared = TRUE and then run scaleNotCenter in advance.

k

Inner dimension of factorization (number of factors). Generally, a higher k will be needed for datasets with more sub-structure. Default 20.

lambda

Regularization parameter. Larger values penalize dataset-specific effects more strongly (i.e. alignment should increase as lambda increases). Default 5.

...

Arguments passed to other methods and wrapped functions.

nIteration

Total number of block coordinate descent iterations to perform. Default 30.

nRandomStarts

Number of restarts to perform (iNMF objective function is non-convex, so taking the best objective from multiple successive initialization is recommended). For easier reproducibility, this increments the random seed by 1 for each consecutive restart, so future factorization of the same dataset can be run with one rep if necessary. Default 1.

seed

Random seed to allow reproducible results. Default 1.

nCores

The number of parallel tasks to speed up the computation. Default 2L. Only supported for platform with OpenMP support.

verbose

Logical. Whether to show information of the progress. Default getOption("ligerVerbose") or TRUE if users have not set.

Value

Note

Currently, Seurat S3 method is not supported for UINMF because there is no simple solution for organizing a number of miscellaneous matrices with a single Seurat object. We strongly recommend that users create a liger object which has the specific structure.

References

April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization, Nat. Comm., 2022

Examples

pbmc <- normalize(pbmc)
pbmc <- selectGenes(pbmc, useUnsharedDatasets = c("ctrl", "stim"))
pbmc <- scaleNotCenter(pbmc)
if (!is.null(getMatrix(pbmc, "scaleUnsharedData", "ctrl")) &&
    !is.null(getMatrix(pbmc, "scaleUnsharedData", "stim"))) {
    # TODO: unshared variable features cannot be detected from this example
    pbmc <- runUINMF(pbmc)
}

[Package rliger version 2.0.1 Index]