R: Compute a list of clonotypes that are shared between seurat...

getSharedClones {APackOfTheClones}

R Documentation

Compute a list of clonotypes that are shared between seurat clusters

Description

This function allows users to get a list of clonotypes that are shared between clusters based on the levels of the active cell identities / some custom identity based on the alt_ident. A list is returned with its names being the shared clonotypes, and the values are numeric vectors indicating the index of the clusters that clonotype is found in. The index corresponds to the index in the default levels of the factored identities.

If run_id is inputted, then the function will attempt to get the shared clonotypes from the corresponding APackOfTheClones run generated from RunAPOTC. Otherwise, it will use the filtering / subsetting parameters to generate the shared clones.

Usage

getSharedClones(
  seurat_obj,
  reduction_base = "umap",
  clonecall = "strict",
  ...,
  extra_filter = NULL,
  alt_ident = NULL,
  run_id = NULL,
  top = NULL,
  top_per_cl = NULL,
  intop = NULL,
  intop_per_cl = NULL,
  publicity = c(2L, Inf)
)

Arguments

`seurat_obj`	Seurat object with one or more dimension reductions and already have been integrated with a TCR/BCR library with `scRepertoire::combineExpression`.
`reduction_base`	character. The seurat reduction to base the clonal expansion plotting on. Defaults to `'umap'` but can be any reduction present within the reductions slot of the input seurat object, including custom ones. If ''pca'“, the cluster coordinates will be based on PC1 and PC2. However, generally APackOfTheClones is used for displaying UMAP and occasionally t-SNE versions to intuitively highlight clonal expansion.
`clonecall`	character. The column name in the seurat object metadata to use. See `scRepertoire` documentation for more information about this parameter that is central to both packages.
`...`	additional "subsetting" keyword arguments indicating the rows corresponding to elements in the seurat object metadata that should be filtered by. E.g., `seurat_clusters = c(1, 9, 10)` will filter the cells to those in the `seurat_clusters` column with any of the values 1, 9, and 10. Unfortunately, column names in the seurat object metadata cannot conflict with the keyword arguments. MAJOR NOTE if any subsetting keyword arguments are a prefix of any preceding argument names (e.g. a column named `reduction` is a prefix of the `reduction_base` argument) R will interpret it as the same argument unless both arguments are named. Additionally, this means any subsequent arguments must be named.
`extra_filter`	character. An additional string that should be formatted exactly like a statement one would pass into dplyr::filter that does additional filtering to cells in the seurat object - on top of the other keyword arguments - based on the metadata. This means that it will be logically AND'ed with any keyword argument filters. This is a more flexible alternative / addition to the filtering keyword arguments. For example, if one wanted to filter by the length of the amino acid sequence of TCRs, one could pass in something like `extra_filter = "nchar(CTaa) - 1 > 10"`. When involving characters, ensure to enclose with single quotes.
`alt_ident`	character. By default, cluster identity is assumed to be whatever is in `Idents(seurat_obj)`, and clones will be grouped by the active ident. However, `alt_ident` could be set as the name of some column in the meta data of the seurat object to be grouped by. This column is meant to have been a product of `Seurat::StashIdent` or manually added.
`run_id`	character. This will be the ID associated with the data of a run, and will be used by other important functions like APOTCPlot and AdjustAPOTC. Defaults to `NULL`, in which case the ID will be generated in the following format: `⁠reduction_base;clonecall;keyword_arguments;extra_filter⁠` where if keyword arguments and extra_filter are underscore characters if there was no input for the `...` and `extra_filter` parameters.
`top`	integer or numeric in (0, 1) - if not null, filters the output clones so that only the shared clonotypes with counts the top `top` count / proportion (for numeric in (0, 1) input) shared clones are kept. For cases where several clonotypes tie in size, the clonotype(s) added are not guaranteed but deterministic given the other arguments are identical.
`top_per_cl`	integer or numeric in (0, 1) - if not null, filters the output clones so that for each seurat cluster, only the clonotypes with the `top_per_cl` frequency/count is preserved when aggregating shared clones, in the same way as the above. Note that if inputted in conjunction with `top`, it will get the intersection of the clonotypes filtered each way. For cases where several clonotypes tie in size, the clonotype(s) added are not guaranteed but deterministic given the other arguments are identical.
`intop`	integer or numeric in (0, 1) - if not null, filters the raw clone sizes before computing the shared clonotypes so that only the clonotypes that have their overall size in the top `intop` largest sizes (if it is integer, else the `intop` proportion) are kept. To emphasize, this argument does not necessarily return the `top` shared clones and likely a little less, because this filters the raw clone sizes, of which, its very likely that not all those clones end up being shared.
`intop_per_cl`	integer or numeric in (0, 1) - if not null, filters the raw clustered clone sizes before computing shared clones, so that for every clone in a seurat cluster, the top `intop_per_cl` count / proportion (for numeric in (0, 1) input) clones are kept.
`publicity`	numeric pair. A simple filter range of `c(lowerbound, upperbound)` to retain only shared clones with their "publicity" - number of clusters they are present in - within this range.

Value

a named list where each name is a clonotype, each element is a numeric indicating which seurat cluster(s) its in, in no particular order. If no shared clones are present, the output is an empty list.

Examples

data("combined_pbmc")

getSharedClones(combined_pbmc)

getSharedClones(
    combined_pbmc,
    orig.ident = c("P17B", "P18B"), # a named subsetting parameter
    clonecall = "aa"
)

# extract shared clones from a past RunAPOTC run
combined_pbmc <- RunAPOTC(
    combined_pbmc, run_id = "foo", verbose = FALSE
)

getSharedClones(
    combined_pbmc, run_id = "foo", top = 5
)

# doing a run and then getting the clones works too
combined_pbmc <- RunAPOTC(combined_pbmc, run_id = "run1", verbose = FALSE)
getSharedClones(combined_pbmc, run_id = "run1")

[Package APackOfTheClones version 1.2.0 Index]