R: Perform Mosaic iNMF (UINMF) on scaled datasets with unshared...

runUINMF {rliger}

R Documentation

Perform Mosaic iNMF (UINMF) on scaled datasets with unshared features

Description

Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R. Kriebel, 2022) using block coordinate descent (alternating non-negative least squares, ANLS) to return factorized H, W, V and U matrices. The objective function is stated as

\arg\min_{H\ge0,W\ge0,V\ge0,U\ge0}\sum_{i}^{d} ||\begin{bmatrix}E_i \\ P_i \end{bmatrix} - (\begin{bmatrix}W \\ 0 \end{bmatrix}+ \begin{bmatrix}V_i \\ U_i \end{bmatrix})Hi||^2_F+ \lambda_i\sum_{i}^{d}||\begin{bmatrix}V_i \\ U_i \end{bmatrix}H_i||_F^2

where E_i is the input non-negative matrix of the i'th dataset, P_i is the input non-negative matrix for the unshared features, d is the total number of datasets. E_i is of size m \times n_i for m shared features and n_i cells, P_i is of size u_i \times n_i for u_i unshared feaetures, H_i is of size k \times n_i, V_i is of size m \times k, W is of size m \times k and U_i is of size u_i \times k.

The factorization produces a shared W matrix (genes by k). For each dataset, an H matrix (k by cells), a V matrix (genes by k) and a U matrix (unshared genes by k). The H matrices represent the cell factor loadings. W is held consistent among all datasets, as it represents the shared components of the metagenes across datasets. The V matrices represent the dataset-specific components of the metagenes, U matrices are similar to Vs but represents the loading contributed by unshared features.

This function adopts highly optimized fast and memory efficient implementation extended from Planc (Kannan, 2016). Pre-installation of extension package RcppPlanc is required. The underlying algorithm adopts the identical ANLS strategy as optimizeALS(unshared = TRUE) in the old version of LIGER.

Usage

runUINMF(object, k = 20, lambda = 5, ...)

## S3 method for class 'liger'
runUINMF(
  object,
  k = 20,
  lambda = 5,
  nIteration = 30,
  nRandomStarts = 1,
  seed = 1,
  nCores = 2L,
  verbose = getOption("ligerVerbose", TRUE),
  ...
)

Arguments

`object`	liger object. Should run `selectGenes` with `unshared = TRUE` and then run `scaleNotCenter` in advance.
`k`	Inner dimension of factorization (number of factors). Generally, a higher `k` will be needed for datasets with more sub-structure. Default `20`.
`lambda`	Regularization parameter. Larger values penalize dataset-specific effects more strongly (i.e. alignment should increase as `lambda` increases). Default `5`.
`...`	Arguments passed to other methods and wrapped functions.
`nIteration`	Total number of block coordinate descent iterations to perform. Default `30`.
`nRandomStarts`	Number of restarts to perform (iNMF objective function is non-convex, so taking the best objective from multiple successive initialization is recommended). For easier reproducibility, this increments the random seed by 1 for each consecutive restart, so future factorization of the same dataset can be run with one rep if necessary. Default `1`.
`seed`	Random seed to allow reproducible results. Default `1`.
`nCores`	The number of parallel tasks to speed up the computation. Default `2L`. Only supported for platform with OpenMP support.
`verbose`	Logical. Whether to show information of the progress. Default `getOption("ligerVerbose")` or `TRUE` if users have not set.

Value

liger method - Returns updated input liger object.
- A list of all H matrices can be accessed with getMatrix(object, "H")
- A list of all V matrices can be accessed with getMatrix(object, "V")
- The W matrix can be accessed with getMatrix(object, "W")
- A list of all U matrices can be accessed with getMatrix(object, "U")

Note

Currently, Seurat S3 method is not supported for UINMF because there is no simple solution for organizing a number of miscellaneous matrices with a single Seurat object. We strongly recommend that users create a liger object which has the specific structure.

References

April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization, Nat. Comm., 2022

Examples

pbmc <- normalize(pbmc)
pbmc <- selectGenes(pbmc, useUnsharedDatasets = c("ctrl", "stim"))
pbmc <- scaleNotCenter(pbmc)
if (!is.null(getMatrix(pbmc, "scaleUnsharedData", "ctrl")) &&
    !is.null(getMatrix(pbmc, "scaleUnsharedData", "stim"))) {
    # TODO: unshared variable features cannot be detected from this example
    pbmc <- runUINMF(pbmc)
}

[Package rliger version 2.0.1 Index]