IntegrateData {Seurat}R Documentation

Integrate data

Description

Perform dataset integration using a pre-computed AnchorSet.

Usage

IntegrateData(
  anchorset,
  new.assay.name = "integrated",
  normalization.method = c("LogNormalize", "SCT"),
  features = NULL,
  features.to.integrate = NULL,
  dims = 1:30,
  k.weight = 100,
  weight.reduction = NULL,
  sd.weight = 1,
  sample.tree = NULL,
  preserve.order = FALSE,
  eps = 0,
  verbose = TRUE
)

Arguments

anchorset

An AnchorSet object generated by FindIntegrationAnchors

new.assay.name

Name for the new assay containing the integrated data

normalization.method

Name of normalization method used: LogNormalize or SCT

features

Vector of features to use when computing the PCA to determine the weights. Only set if you want a different set from those used in the anchor finding process

features.to.integrate

Vector of features to integrate. By default, will use the features used in anchor finding.

dims

Number of dimensions to use in the anchor weighting procedure

k.weight

Number of neighbors to consider when weighting anchors

weight.reduction

Dimension reduction to use when calculating anchor weights. This can be one of:

  • A string, specifying the name of a dimension reduction present in all objects to be integrated

  • A vector of strings, specifying the name of a dimension reduction to use for each object to be integrated

  • A vector of DimReduc objects, specifying the object to use for each object in the integration

  • NULL, in which case a new PCA will be calculated and used to calculate anchor weights

Note that, if specified, the requested dimension reduction will only be used for calculating anchor weights in the first merge between reference and query, as the merged object will subsequently contain more cells than was in query, and weights will need to be calculated for all cells in the object.

sd.weight

Controls the bandwidth of the Gaussian kernel for weighting

sample.tree

Specify the order of integration. Order of integration should be encoded in a matrix, where each row represents one of the pairwise integration steps. Negative numbers specify a dataset, positive numbers specify the integration results from a given row (the format of the merge matrix included in the hclust function output). For example: matrix(c(-2, 1, -3, -1), ncol = 2) gives:

            [,1]  [,2]
       [1,]   -2   -3
       [2,]    1   -1

Which would cause dataset 2 and 3 to be integrated first, then the resulting object integrated with dataset 1.

If NULL, the sample tree will be computed automatically.

preserve.order

Do not reorder objects based on size for each pairwise integration.

eps

Error bound on the neighbor finding algorithm (from RANN)

verbose

Print progress bars and output

Details

The main steps of this procedure are outlined below. For a more detailed description of the methodology, please see Stuart, Butler, et al Cell 2019. doi:10.1016/j.cell.2019.05.031; doi:10.1101/460147

For pairwise integration:

For multiple dataset integration, we perform iterative pairwise integration. To determine the order of integration (if not specified via sample.tree), we

Value

Returns a Seurat object with a new integrated Assay. If normalization.method = "LogNormalize", the integrated data is returned to the data slot and can be treated as log-normalized, corrected data. If normalization.method = "SCT", the integrated data is returned to the scale.data slot and can be treated as centered, corrected Pearson residuals.

References

Stuart T, Butler A, et al. Comprehensive Integration of Single-Cell Data. Cell. 2019;177:1888-1902 doi:10.1016/j.cell.2019.05.031

Examples

## Not run: 
# to install the SeuratData package see https://github.com/satijalab/seurat-data
library(SeuratData)
data("panc8")

# panc8 is a merged Seurat object containing 8 separate pancreas datasets
# split the object by dataset
pancreas.list <- SplitObject(panc8, split.by = "tech")

# perform standard preprocessing on each object
for (i in 1:length(pancreas.list)) {
  pancreas.list[[i]] <- NormalizeData(pancreas.list[[i]], verbose = FALSE)
  pancreas.list[[i]] <- FindVariableFeatures(
    pancreas.list[[i]], selection.method = "vst",
    nfeatures = 2000, verbose = FALSE
  )
}

# find anchors
anchors <- FindIntegrationAnchors(object.list = pancreas.list)

# integrate data
integrated <- IntegrateData(anchorset = anchors)

## End(Not run)


[Package Seurat version 5.1.0 Index]