| Conos {conos} | R Documentation |
Conos R6 class
Description
The class encompasses sample collections, providing methods for calculating and visualizing joint graph and communities.
Public fields
sampleslist of samples (Pagoda2 or Seurat objects)
pairspairwise alignment results
graphalignment graph
clusterslist of clustering results named by clustering type
expression.adjadjusted expression values
embeddingslist of joint embeddings
embeddingjoint embedding
n.coresnumber of cores
misclist with unstructured additional info
override.conos.plot.themeboolean Whether to override the conos plot theme
Methods
Public methods
Method new()
initialize Conos class
Usage
Conos$new( x, ..., n.cores = parallel::detectCores(logical = FALSE), verbose = TRUE, override.conos.plot.theme = FALSE )
Arguments
xa named list of pagoda2 or Seurat objects (one per sample)
...additional parameters upon initializing Conos
n.coresnumeric Number of cores to use (default=parallel::detectCores(logical=FALSE))
verboseboolean Whether to provide verbose output (default=TRUE)
override.conos.plot.themeboolean Whether to reset plot settings to the ggplot2 default (default=FALSE)
Returns
a new 'Conos' object
Examples
con <- Conos$new(small_panel.preprocessed, n.cores=1)
Method addSamples()
Initialize or add a set of samples to the conos panel. Note: this will simply add samples, but will not update graph, clustering, etc.
Usage
Conos$addSamples(x, replace = FALSE, verbose = FALSE)
Arguments
xa named list of pagoda2 or Seurat objects (one per sample)
replaceboolean Whether the existing samples should be purged before adding new ones (default=FALSE)
verboseboolean Whether to provide verbose output (default=FALSE)
Returns
invisible view of the full sample list
Method buildGraph()
Build the joint graph that encompasses all the samples, establishing weighted inter-sample cell-to-cell links
Usage
Conos$buildGraph( k = 15, k.self = 10, k.self.weight = 0.1, alignment.strength = NULL, space = "PCA", matching.method = "mNN", metric = "angular", k1 = k, data.type = "counts", l2.sigma = 1e+05, var.scale = TRUE, ncomps = 40, n.odgenes = 2000, matching.mask = NULL, exclude.samples = NULL, common.centering = TRUE, verbose = TRUE, base.groups = NULL, append.global.axes = TRUE, append.decoys = TRUE, decoy.threshold = 1, n.decoys = k * 2, score.component.variance = FALSE, snn = FALSE, snn.quantile = 0.9, min.snn.jaccard = 0, min.snn.weight = 0, snn.k.self = k.self, balance.edge.weights = FALSE, balancing.factor.per.cell = NULL, same.factor.downweight = 1, k.same.factor = k, balancing.factor.per.sample = NULL )
Arguments
kinteger integer Size of the inter-sample neighborhood (default=15)
k.selfinteger Size of the with-sample neighborhoods (default=10).
k.self.weightnumeric Weight multiplier on the intra-sample edges relative to inter-sample edges (default=0.1)
alignment.strengthnumeric Alignment strength (default=NULL will result in alignment.strength=0)
spacecharacter Reduced expression space used to establish putative alignments between pairs of samples (default='PCA'). Currently supported spaces are: — "CPCA" Common principal component analysis — "JNMF" Joint NMF — "genes" Gene expression space (log2 transformed) — "PCA" Principal component analysis — "CCA" Canonical correlation analysis — "PMA" (Penalized Multivariate Analysis <https://cran.r-project.org/web/packages/PMA/index.html>)
matching.methodcharacter Matching method (default='mNN'). Currently supported methods are "NN" (nearest neighbors) or "mNN" (mututal nearest neighbors).
metriccharacter Distance metric to measure similarity (default='angular'). Currenlty supported metrics are "angular" and "L2".
k1numeric Neighborhood radius for identifying mutually-matching neighbors (default=k). Note that k1 must be greater than or equal to k, i.e. k1>=k. Increasing k1 beyond k will lead to more aggressive alignment of distinct subpopulations (i.e. increased alignment strengths).
data.typecharacter Type of data type in the input pagoda2 objects within r.n (default='counts').
l2.sigmanumeric L2 distances get transformed as exp(-d/sigma) using this value (default=1e5)
var.scaleboolean Whether to use common variance scaling (default=TRUE). If TRUE, use geometric means for variance, as we're trying to focus on the common variance components. See scaledMatricesP2() code.
ncompsinteger Number of components (default=40)
n.odgenesinteger Number of overdispersed genes to be used in each pairwise alignment (default=2000)
matching.maskan optional matrix explicitly specifying which pairs of samples should be compared (a symmetrical matrix of logical values with row and column names corresponding to sample names). (default=NULL). By default, comparisons between all paris are allowed. The argument can be used to exclude comparisons across certain pairs of samples (e.g. techincal replicates, which are expected to show very high similarity).
exclude.samplesoptional list of sample names that should be excluded from the alignment and the resulting graph (default=NULL)
common.centeringboolean When calculating reduced expression space for a given sample pair, whether the expression of genes should be centered using the mean from both samples (TRUE) or using the mean within each sample (FALSE) (default=TRUE)
verboseboolean Whether to provide verbose output (default=TRUE)
base.groupsan optional factor on cells specifying previously-obtained cell grouping to be used for adjusting the sample alignment (default: NULL). Specifically, cell clusters specfiieid by the base.groups can be used to i) calculate global expression axes which are appended to the overall set of eigenvectors, ii) adding decoy cells.
append.global.axesboolean Whether to project samples on global expression axes, as defined by pre-defined (typically crude) set of cell subpopulations as specified by the base.gruops parameter (default=TRUE, but works only if base.groups is specified)
append.decoysboolean Whether to use pre-defined cell groups (specified by base.groups) to append decoy cells to the samples which are otherwise lacking any of the pre-specified cell groups (default=TRUE, but works only if base.groups is specified). The decoy cells can reduce the number of erroneous matches in highly heterogeneous sample collections, where some of the samples lack entire cell subpopulations which are found in other samples. The approach only works if the base.groups (typically a crude clustering of top-level cell types) can be established with a reasonable confidence.
decoy.thresholdinteger Minimal number of cells of a given cell type that should exist in a given sample (according to base.groups) to avoid addition of decoy cells to that sample for the purposes of alignment (default=1)
n.decoysinteger Number of decoy cells that should be added to a sample that had less than decoy.threshold cells of a given cell type (default=k*2)
score.component.varianceboolean Whether to score the amount of total variance explained by different components (default=FALSE as it takes extra time to calculate)
snnboolean Whether to transform the joint graph by computing a shared nearest neighborhood graph (analogous to Seurat 3), further weighting the edges between two matched cells based on the similarity (measured by Jaccard coefficient) of all of their predicted neighbors (across all of the samples) (default: FALSE)
snn.quantilenumeric Specifies how the shared neighborhood graph transformation will determine final edge weights. If snn.quantile=NULL, the edge weight will be simply equal to the Jaccard coefficient of the neighborhoods. If snn.quantile is a vector of two numeric values (p1, p2), they will be treated as quantile probabilities, and quantile values (q1,q2) on the set of all Jaccard coefficients (for all edges) will be determiend. The edge weights will then be reset, so that edges with Jaccard coefficients below or equal to q1 will be set to 0, and those with coefficients >=q2 will be set to 1. The rest of the weights will be mapped uniformly from [q1,q2]->[0,1] range. If a single numeric value is supplied, it will be treated as a symmetric quantile probability (i.e. snn.quantile=0.8 is equivalent to specifying snn.quantile=c(1-0.8,0.8)). (default: 0.9)
min.snn.jaccardnumeric Minimum Jaccard coefficient required for a shared neighborhood graph edge (default: 0). The edges with Jaccard coefficients below this threshold will be removed (i.e. weight set to 0)
min.snn.weightnumeric Shared nearest neighbor procedure will adjust the weights of the edges, and even eliminate some of the edges (by setting their weight to zero). The min.snn.weight parameter allows to set a minimal adjusted edge weight, so that the edge weight is never reduced beyond this level (and hence never deleted) (default: 0 - no adjustments)
snn.k.selfinteger Size of the within-sample neighorhood to be used in shared nearest neighbor calculations (default=k.self)
balance.edge.weightsboolean Whether to balance edge weights to control for a cell- or sample- specific factor (default=FALSE)
balancing.factor.per.cellA per-cell factor (discrete factor, named with cell names) specifying a design difference should be controlled for by adjusting edge weights in the joint graph (default=NULL)
same.factor.downweightnumeric Optional weighting factor for edges connecting cells with the same cell factor level per cell balancing (default=1.0)
k.same.factorinteger An neighborhood size that should be used when aligning samples of the same balancing.factor.per.sample level. Setting a value smaller than k will lead to reduction of alingment strenth within the sample batches (default=k)
balancing.factor.per.sampleA covariate factor per sample that should be controlled for by adjusting edge weights in the joint graph (default=NULL)
Returns
joint graph to be used for downstream analysis
Examples
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN',
metric='angular', score.component.variance=TRUE, verbose=TRUE)
Method getDifferentialGenes()
Calculate genes differentially expressed between cell clusters. Estimates base mean, z-score, p-values, specificity, precision, expressionFraction, AUC (if append.auc=TRUE)
Usage
Conos$getDifferentialGenes( clustering = NULL, groups = NULL, z.threshold = 3, upregulated.only = FALSE, verbose = TRUE, append.specificity.metrics = TRUE, append.auc = TRUE )
Arguments
clusteringcharacter Name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default: NULL - if groups are not specified, the first clustering will be used)
groupsa cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default: NULL, see clustering)
z.thresholdnumeric Minimum absolute value of a Z score for which the genes should be reported (default=3.0).
upregulated.onlyboolean If TRUE, will report only genes significantly upregulated in each cluster; otherwise both up- and down-regulated genes will be reported (default=FALSE)
verboseboolean Whether to provide verbose output (default=TRUE)
append.specificity.metricsboolean Whether to append specificity metrics (default=TRUE)
append.aucboolean Whether to append AUC scores (default=TRUE)
Returns
list of DE results; each is a data frame with rows corresponding to the differentially expressed genes, and columns listing log2 fold change (M), signed Z scores (both raw and adjusted for mulitple hypothesis using BH correction), optional specificty/sensitivity and AUC metrics.
Method findCommunities()
Find cell clusters (as communities on the joint graph)
Usage
Conos$findCommunities( method = leiden.community, min.group.size = 0, name = NULL, test.stability = FALSE, stability.subsampling.fraction = 0.95, stability.subsamples = 100, verbose = TRUE, cls = NULL, sr = NULL, ... )
Arguments
methodcommunity detection method (igraph syntax) (default=leiden.community)
min.group.sizenumeric Minimal allowed community size (default=0)
namecharacter Optional name of the clustering result (will default to the algorithm name) (default=NULL will try to obtain the name from the community detection method, or will use 'community' as a default)
test.stabilityboolean Whether to test stability of community detection (default=FALSE)
stability.subsampling.fractionnumeric Fraction of clusters to subset (default=0.95). Must be within range [0, 1].
stability.subsamplesinteger Number of subsampling iterations (default=100)
verboseboolean Whether to provide verbose output (default=TRUE)
clsoptional pre-calculated community result (may be useful for stability testing) (default: NULL)
sroptional pre-calculated subsampled community results (useful for stability testing) (default: NULL)
...extra parameters are passed to the specified community detection method
Returns
invisible list containing identified communities (groups) and the full community detection result (result); The results are stored in $clusters$name slot in the conos object. Each such slot contains an object with elements: $results which stores the raw output of the community detection method, and $groups which is a factor on cells describing the resulting clustering. The later can be used, for instance, in plotting: con$plotGraph(groups=con$clusters$leiden$groups). If test.stability==TRUE, then the result object will also contain a $stability slot.
Examples
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN',
metric='angular', score.component.variance=TRUE, verbose=TRUE)
con$findCommunities(method = igraph::walktrap.community, steps=5)
Method plotPanel()
Plot panel of individual embeddings per sample with joint coloring
Usage
Conos$plotPanel( clustering = NULL, groups = NULL, colors = NULL, gene = NULL, use.local.clusters = FALSE, plot.theme = NULL, use.common.embedding = FALSE, embedding = NULL, adj.list = NULL, ... )
Arguments
clusteringcharacter Name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default=NULL - if groups are not specified, the first clustering will be used)
groupsa cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default=NULL, see clustering)
colorsa color factor (named with cell names) use for cell coloring
geneshow expression of a gene
use.local.clustersboolean Whether clusters should be taken from the individual samples; otherwise joint clusters in the conos object will be used (see clustering) (default=FALSE).
plot.themestring Theme for the plot, passed to plotSamples() (default=NULL)
use.common.embeddingboolean Whether a joint embedding in the conos object should be used (or embeddings determined for the individual samples) (default=FALSE)
embedding(default=NULL) If a character value is passed, it is interpreted as an embedding name (a name of a joint embedding in conos when use.commmon.embedding=TRUE, or a name of an embedding within the individual objects when use.common.embedding=FALSE). If a matrix is passed, it is interpreted as an actual embedding (then first two columns are interpreted as x/y coordinates, row names must be cell names). If NULL, the default embedding will be used.
adj.listan optional list of additional ggplot2 directions to apply (default=NULL)
...Additional parameters passed to plotSamples(), plotEmbeddings(), sccore::embeddingPlot().
Returns
cowplot grid object with the panel of plots
Method embedGraph()
Generate an embedding of a joint graph
Usage
Conos$embedGraph( method = "largeVis", embedding.name = method, M = 1, gamma = 1, alpha = 0.1, perplexity = NA, sgd_batches = 1e+08, seed = 1, verbose = TRUE, target.dims = 2, ... )
Arguments
methodEmbedding method (default='largeVis'). Currently 'largeVis' and 'UMAP' are supported.
embedding.namecharacter Optional name of the name of the embedding set by user to store multiple embeddings (default: method name)
Mnumeric (largeVis) The number of negative edges to sample for each positive edge to be used (default=1)
gammanumeric (largeVis) The strength of the force pushing non-neighbor nodes apart (default=1)
alphanumeric (largeVis) Hyperparameter used in the default distance function,
1 / (1 + \alpha \dot ||y_i - y_j||^2)(default=0.1). The function relates the distance between points in the low-dimensional projection to the likelihood that the two points are nearest neighbors. Increasing\alphatends to push nodes and their neighbors closer together; decreasing\alphaproduces a broader distribution. Setting\alphato zero enables the alternative distance function.\alphabelow zero is meaningless.perplexity(largeVis) The perplexity passed to largeVis (default=NA)
sgd_batches(largeVis) The number of edges to process during SGD (default=1e8). Defaults to a value set based on the size of the dataset. If the parameter given is between
0and1, the default value will be multiplied by the parameter.seednumeric Random seed for the largeVis algorithm (default=1)
verboseboolean Whether to provide verbose output (default=TRUE)
target.dimsnumeric Number of dimensions for the reduction (default=2). Higher dimensions can be used to generate embeddings for subsequent reductions by other methods, such as tSNE
...additional arguments, passed to UMAP embedding (run ?conos:::embedGraphUmap for more info)
Method plotClusterStability()
Plot cluster stability statistics.
Usage
Conos$plotClusterStability(clustering = NULL, what = "all")
Arguments
clusteringstring Name of the clustering result to show (default=NULL)
whatstring Show a specific plot (ari - adjusted rand index, fjc - flat Jaccard, hjc - hierarchical Jaccard, dend - cluster dendrogram, all - everything except 'dend') (default='all')
Returns
cluster stability statistics
Method plotGraph()
Plot joint graph
Usage
Conos$plotGraph( color.by = "cluster", clustering = NULL, embedding = NULL, groups = NULL, colors = NULL, gene = NULL, plot.theme = NULL, subset = NULL, ... )
Arguments
color.bycharacter A shortcut to color the plot by 'cluster' or by 'sample' (default: 'cluster'). If any other string is input, an error is thrown.
clusteringa character name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default: NULL - if groups are not specified, the first clustering will be used)
embeddingA character name of an embedding, or a matrix of the actual embedding (rownames should correspond to cells, first to columns to x/y coordinates). If NULL (default: NULL), the latest generated embedding will be used
groupsa cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default: NULL, see clustering)
colorsa color factor (named with cell names) use for cell coloring (default=NULL)
geneShow expression of a gene (default=NULL)
plot.themeTheme for the plot, passed to sccore::embeddingPlot() (default=NULL)
subsetA subset of cells to show (default: NULL - shows all the cells)
...Additional parameters passed to sccore::embeddingPlot()
Returns
ggplot2 plot of joint graph
Method correctGenes()
Smooth expression of genes to minimize the batch effect between samples Use diffusion of expression on graph with the equation dv = exp(-a * (v + b))
Usage
Conos$correctGenes( genes = NULL, n.od.genes = 500, fading = 10, fading.const = 0.5, max.iters = 15, tol = 0.005, name = "diffusion", verbose = TRUE, count.matrix = NULL, normalize = TRUE )
Arguments
genesList of genes to be smooothed smoothing (default=NULL will smooth top n.od.genes overdispersed genes)
n.od.genesnumeric If 'genes' is NULL, top n.od.genes of overdispersed genes are taken across all samples (default=500)
fadingnumeric Level of fading of expression change from distance on the graph (parameter 'a' of the equation) (default=10)
fading.constnumeric Minimal penalty for each new edge during diffusion (parameter 'b' of the equation) (default=0.5)
max.itersnumeric Maximal number of diffusion iterations (default=15)
tolnumeric Tolerance after which the diffusion stops (default=5e-3)
namestring Name to save the correction (default='diffusion')
verboseboolean Verbose mode (default=TRUE)
count.matrixAlternative gene count matrix to correct (rows: genes, columns: cells; has to be dense matrix). Default: joint count matrix for all datasets.
normalizeboolean Whether to normalize values (default=TRUE)
Returns
smoothed expression of the input genes
Method propagateLabels()
Estimate labeling distribution for each vertex, based on a partial labeling of the cells. There are two methods used for the propagation to calculate the distribution of labels: "solver" and "diffusion". * "diffusion" (default) will estimate the labeling distribution for each vertex, based on provided labels using a random walk. * "solver" will propagate labels using the algorithm described by Zhu, Ghahramani, Lafferty (2003) <http://mlg.eng.cam.ac.uk/zoubin/papers/zgl.pdf> Confidence values are then calculated by taking the maximum value from this distribution of labels, for each cell.
Usage
Conos$propagateLabels(labels, method = "diffusion", ...)
Arguments
labelsInput labels
methodtype of propagation. Either 'diffusion' or 'solver'. 'solver' gives better result but has bad asymptotics, so is inappropriate for datasets > 20k cells. (default='diffusion')
...additional arguments for conos:::propagateLabels* functions
Returns
list with three fields: * labels = matrix with distribution of label probabilities for each vertex by rows. * uncertainty = 1 - confidence values * label.distribution = the distribution of labels calculated using either the methods "diffusion" or "solver"
Method getClusterCountMatrices()
Calculate pseudo-bulk expression matrices for clusters (by adding up, for each gene, all of the molecules detected for all cells in a given cluster in a given sample)
Usage
Conos$getClusterCountMatrices( clustering = NULL, groups = NULL, common.genes = TRUE, omit.na.cells = TRUE )
Arguments
clusteringstring Name of the clustering to use
groupsa factor on cells to use for coloring
common.genesboolean Whether to bring individual sample matrices to a common gene list (default=TRUE)
omit.na.cellsboolean If set to FALSE, the resulting matrices will include a first column named 'NA' that will report total molecule counts for all of the cells that were not covered by the provided factor. (default=TRUE)
Returns
a list of per-sample uniform dense matrices with rows being genes, and columns being clusters
Method getDatasetPerCell()
applies 'getCellNames()' on all samples
Usage
Conos$getDatasetPerCell()
Returns
list of cellnames for all samples
Examples
con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getDatasetPerCell()
Method getJointCountMatrix()
Retrieve joint count matrices
Usage
Conos$getJointCountMatrix(raw = FALSE)
Arguments
rawboolean If TRUE, return merged "raw" count matrices, using function getRawCountMatrix(). Otherwise, return the merged count matrices, using getCountMatrix(). (default=FALSE)
Returns
list of merged count matrices
Examples
con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getJointCountMatrix()
Method clone()
The objects of this class are cloneable with this method.
Usage
Conos$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
## ------------------------------------------------
## Method `Conos$new`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
## ------------------------------------------------
## Method `Conos$buildGraph`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN',
metric='angular', score.component.variance=TRUE, verbose=TRUE)
## ------------------------------------------------
## Method `Conos$findCommunities`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN',
metric='angular', score.component.variance=TRUE, verbose=TRUE)
con$findCommunities(method = igraph::walktrap.community, steps=5)
## ------------------------------------------------
## Method `Conos$getDatasetPerCell`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$getDatasetPerCell()
## ------------------------------------------------
## Method `Conos$getJointCountMatrix`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$getJointCountMatrix()