R: Perform factorization for new data

optimizeNewData {rliger}

R Documentation

Perform factorization for new data

Description

Uses an efficient strategy for updating that takes advantage of the information in the existing factorization. Assumes that variable features are presented in the new datasets. Two modes are supported (controlled by merge):

Append new data to existing datasets specified by useDatasets. Here the existing V matrices for the target datasets will directly be used as initialization, and new H matrices for the merged matrices will be initialized accordingly.
Set new data as new datasets. Initial V matrices for them will be copied from datasets specified by useDatasets, and new H matrices will be initialized accordingly.

Usage

optimizeNewData(
  object,
  dataNew,
  useDatasets,
  merge = TRUE,
  lambda = NULL,
  nIteration = 30,
  seed = 1,
  verbose = getOption("ligerVerbose"),
  new.data = dataNew,
  which.datasets = useDatasets,
  add.to.existing = merge,
  max.iters = nIteration,
  thresh = NULL
)

Arguments

`object`	A liger object. Should have integrative factorization performed e.g. (`runINMF`) in advance.
`dataNew`	Named list of raw count matrices, genes by cells.
`useDatasets`	Selection of datasets to append new data to if `merge = TRUE`, or the datasets to inherit `V` matrices from and initialize the optimization when `merge = FALSE`. Should match the length and order of `dataNew`.
`merge`	Logical, whether to add the new data to existing datasets or treat as totally new datasets (i.e. calculate new `V` matrices). Default `TRUE`.
`lambda`	Numeric regularization parameter. By default `NULL`, this will use the lambda value used in the latest factorization.
`nIteration`	Number of block coordinate descent iterations to perform. Default `30`.
`seed`	Random seed to allow reproducible results. Default `1`. Used by `runINMF` factorization.
`verbose`	Logical. Whether to show information of the progress. Default `getOption("ligerVerbose")` which is `TRUE` if users have not set.
`new.data`, `which.datasets`, `add.to.existing`, `max.iters`	These arguments are now replaced by others and will be removed in the future. Please see usage for replacement.
`thresh`	Deprecated. New implementation of iNMF does not require a threshold for convergence detection. Setting a large enough `nIteration` will bring it to convergence.

Value

object with W slot updated with the new W matrix, and the H and V slots of each ligerDataset object in the datasets slot updated with the new dataset specific H and V matrix, respectively.

Examples

pbmc <- normalize(pbmc)
pbmc <- selectGenes(pbmc)
pbmc <- scaleNotCenter(pbmc)
# Only running a few iterations for fast examples
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
    pbmc <- runINMF(pbmc, k = 20, nIteration = 2)
    # Create fake new data by increasing all non-zero count in "ctrl" by 1,
    # and make unique cell identifiers
    ctrl2 <- rawData(dataset(pbmc, "ctrl"))
    ctrl2@x <- ctrl2@x + 1
    colnames(ctrl2) <- paste0(colnames(ctrl2), 2)
    pbmcNew <- optimizeNewData(pbmc, dataNew = list(ctrl2 = ctrl2),
                               useDatasets = "ctrl", nIteration = 2)
}