Conos {conos} | R Documentation |

The class encompasses sample collections, providing methods for calculating and visualizing joint graph and communities.

`samples`

list of samples (Pagoda2 or Seurat objects)

`pairs`

pairwise alignment results

`graph`

alignment graph

`clusters`

list of clustering results named by clustering type

`expression.adj`

adjusted expression values

`embeddings`

list of joint embeddings

`embedding`

joint embedding

`n.cores`

number of cores

`misc`

list with unstructured additional info

`new()`

initialize Conos class

Conos$new( x, ..., n.cores = parallel::detectCores(logical = FALSE), verbose = TRUE, override.conos.plot.theme = FALSE )

`x`

a named list of pagoda2 or Seurat objects (one per sample)

`...`

additional parameters upon initializing Conos

`n.cores`

numeric Number of cores to use (default=parallel::detectCores(logical=FALSE))

`verbose`

boolean Whether to provide verbose output (default=TRUE)

`override.conos.plot.theme`

boolean Whether to reset plot settings to the ggplot2 default (default=FALSE)

a new 'Conos' object

con <- Conos$new(small_panel.preprocessed, n.cores=1)

`addSamples()`

Initialize or add a set of samples to the conos panel. Note: this will simply add samples, but will not update graph, clustering, etc.

Conos$addSamples(x, replace = FALSE, verbose = FALSE)

`x`

a named list of pagoda2 or Seurat objects (one per sample)

`replace`

boolean Whether the existing samples should be purged before adding new ones (default=FALSE)

`verbose`

boolean Whether to provide verbose output (default=FALSE)

invisible view of the full sample list

`buildGraph()`

Build the joint graph that encompasses all the samples, establishing weighted inter-sample cell-to-cell links

Conos$buildGraph( k = 15, k.self = 10, k.self.weight = 0.1, alignment.strength = NULL, space = "PCA", matching.method = "mNN", metric = "angular", k1 = k, data.type = "counts", l2.sigma = 1e+05, var.scale = TRUE, ncomps = 40, n.odgenes = 2000, matching.mask = NULL, exclude.samples = NULL, common.centering = TRUE, verbose = TRUE, base.groups = NULL, append.global.axes = TRUE, append.decoys = TRUE, decoy.threshold = 1, n.decoys = k * 2, score.component.variance = FALSE, snn = FALSE, snn.quantile = 0.9, min.snn.jaccard = 0, min.snn.weight = 0, snn.k.self = k.self, balance.edge.weights = FALSE, balancing.factor.per.cell = NULL, same.factor.downweight = 1, k.same.factor = k, balancing.factor.per.sample = NULL )

`k`

integer integer Size of the inter-sample neighborhood (default=15)

`k.self`

integer Size of the with-sample neighborhoods (default=10).

`k.self.weight`

numeric Weight multiplier on the intra-sample edges relative to inter-sample edges (default=0.1)

`alignment.strength`

numeric Alignment strength (default=NULL will result in alignment.strength=0)

`space`

character Reduced expression space used to establish putative alignments between pairs of samples (default='PCA'). Currently supported spaces are: — "CPCA" Common principal component analysis — "JNMF" Joint NMF — "genes" Gene expression space (log2 transformed) — "PCA" Principal component analysis — "CCA" Canonical correlation analysis — "PMA" (Penalized Multivariate Analysis <https://cran.r-project.org/web/packages/PMA/index.html>)

`matching.method`

character Matching method (default='mNN'). Currently supported methods are "NN" (nearest neighbors) or "mNN" (mututal nearest neighbors).

`metric`

character Distance metric to measure similarity (default='angular'). Currenlty supported metrics are "angular" and "L2".

`k1`

numeric Neighborhood radius for identifying mutually-matching neighbors (default=k). Note that k1 must be greater than or equal to k, i.e. k1>=k. Increasing k1 beyond k will lead to more aggressive alignment of distinct subpopulations (i.e. increased alignment strengths).

`data.type`

character Type of data type in the input pagoda2 objects within r.n (default='counts').

`l2.sigma`

numeric L2 distances get transformed as exp(-d/sigma) using this value (default=1e5)

`var.scale`

boolean Whether to use common variance scaling (default=TRUE). If TRUE, use geometric means for variance, as we're trying to focus on the common variance components. See scaledMatricesP2() code.

`ncomps`

integer Number of components (default=40)

`n.odgenes`

integer Number of overdispersed genes to be used in each pairwise alignment (default=2000)

`matching.mask`

an optional matrix explicitly specifying which pairs of samples should be compared (a symmetrical matrix of logical values with row and column names corresponding to sample names). (default=NULL). By default, comparisons between all paris are allowed. The argument can be used to exclude comparisons across certain pairs of samples (e.g. techincal replicates, which are expected to show very high similarity).

`exclude.samples`

optional list of sample names that should be excluded from the alignment and the resulting graph (default=NULL)

`common.centering`

boolean When calculating reduced expression space for a given sample pair, whether the expression of genes should be centered using the mean from both samples (TRUE) or using the mean within each sample (FALSE) (default=TRUE)

`verbose`

boolean Whether to provide verbose output (default=TRUE)

`base.groups`

an optional factor on cells specifying previously-obtained cell grouping to be used for adjusting the sample alignment (default: NULL). Specifically, cell clusters specfiieid by the base.groups can be used to i) calculate global expression axes which are appended to the overall set of eigenvectors, ii) adding decoy cells.

`append.global.axes`

boolean Whether to project samples on global expression axes, as defined by pre-defined (typically crude) set of cell subpopulations as specified by the base.gruops parameter (default=TRUE, but works only if base.groups is specified)

`append.decoys`

boolean Whether to use pre-defined cell groups (specified by base.groups) to append decoy cells to the samples which are otherwise lacking any of the pre-specified cell groups (default=TRUE, but works only if base.groups is specified). The decoy cells can reduce the number of erroneous matches in highly heterogeneous sample collections, where some of the samples lack entire cell subpopulations which are found in other samples. The approach only works if the base.groups (typically a crude clustering of top-level cell types) can be established with a reasonable confidence.

`decoy.threshold`

integer Minimal number of cells of a given cell type that should exist in a given sample (according to base.groups) to avoid addition of decoy cells to that sample for the purposes of alignment (default=1)

`n.decoys`

integer Number of decoy cells that should be added to a sample that had less than decoy.threshold cells of a given cell type (default=k*2)

`score.component.variance`

boolean Whether to score the amount of total variance explained by different components (default=FALSE as it takes extra time to calculate)

`snn`

boolean Whether to transform the joint graph by computing a shared nearest neighborhood graph (analogous to Seurat 3), further weighting the edges between two matched cells based on the similarity (measured by Jaccard coefficient) of all of their predicted neighbors (across all of the samples) (default: FALSE)

`snn.quantile`

numeric Specifies how the shared neighborhood graph transformation will determine final edge weights. If snn.quantile=NULL, the edge weight will be simply equal to the Jaccard coefficient of the neighborhoods. If snn.quantile is a vector of two numeric values (p1, p2), they will be treated as quantile probabilities, and quantile values (q1,q2) on the set of all Jaccard coefficients (for all edges) will be determiend. The edge weights will then be reset, so that edges with Jaccard coefficients below or equal to q1 will be set to 0, and those with coefficients >=q2 will be set to 1. The rest of the weights will be mapped uniformly from [q1,q2]->[0,1] range. If a single numeric value is supplied, it will be treated as a symmetric quantile probability (i.e. snn.quantile=0.8 is equivalent to specifying snn.quantile=c(1-0.8,0.8)). (default: 0.9)

`min.snn.jaccard`

numeric Minimum Jaccard coefficient required for a shared neighborhood graph edge (default: 0). The edges with Jaccard coefficients below this threshold will be removed (i.e. weight set to 0)

`min.snn.weight`

numeric Shared nearest neighbor procedure will adjust the weights of the edges, and even eliminate some of the edges (by setting their weight to zero). The min.snn.weight parameter allows to set a minimal adjusted edge weight, so that the edge weight is never reduced beyond this level (and hence never deleted) (default: 0 - no adjustments)

`snn.k.self`

integer Size of the within-sample neighorhood to be used in shared nearest neighbor calculations (default=k.self)

`balance.edge.weights`

boolean Whether to balance edge weights to control for a cell- or sample- specific factor (default=FALSE)

`balancing.factor.per.cell`

A per-cell factor (discrete factor, named with cell names) specifying a design difference should be controlled for by adjusting edge weights in the joint graph (default=NULL)

`same.factor.downweight`

numeric Optional weighting factor for edges connecting cells with the same cell factor level per cell balancing (default=1.0)

`k.same.factor`

integer An neighborhood size that should be used when aligning samples of the same balancing.factor.per.sample level. Setting a value smaller than k will lead to reduction of alingment strenth within the sample batches (default=k)

`balancing.factor.per.sample`

A covariate factor per sample that should be controlled for by adjusting edge weights in the joint graph (default=NULL)

joint graph to be used for downstream analysis

con <- Conos$new(small_panel.preprocessed, n.cores=1) con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN', metric='angular', score.component.variance=TRUE, verbose=TRUE)

`getDifferentialGenes()`

Calculate genes differentially expressed between cell clusters. Estimates base mean, z-score, p-values, specificity, precision, expressionFraction, AUC (if append.auc=TRUE)

Conos$getDifferentialGenes( clustering = NULL, groups = NULL, z.threshold = 3, upregulated.only = FALSE, verbose = TRUE, append.specificity.metrics = TRUE, append.auc = TRUE )

`clustering`

character Name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default: NULL - if groups are not specified, the first clustering will be used)

`groups`

a cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default: NULL, see clustering)

`z.threshold`

numeric Minimum absolute value of a Z score for which the genes should be reported (default=3.0).

`upregulated.only`

boolean If TRUE, will report only genes significantly upregulated in each cluster; otherwise both up- and down-regulated genes will be reported (default=FALSE)

`verbose`

boolean Whether to provide verbose output (default=TRUE)

`append.specificity.metrics`

boolean Whether to append specificity metrics (default=TRUE)

`append.auc`

boolean Whether to append AUC scores (default=TRUE)

list of DE results; each is a data frame with rows corresponding to the differentially expressed genes, and columns listing log2 fold change (M), signed Z scores (both raw and adjusted for mulitple hypothesis using BH correction), optional specificty/sensitivity and AUC metrics.

`findCommunities()`

Find cell clusters (as communities on the joint graph)

Conos$findCommunities( method = leiden.community, min.group.size = 0, name = NULL, test.stability = FALSE, stability.subsampling.fraction = 0.95, stability.subsamples = 100, verbose = TRUE, cls = NULL, sr = NULL, ... )

`method`

community detection method (igraph syntax) (default=leiden.community)

`min.group.size`

numeric Minimal allowed community size (default=0)

`name`

character Optional name of the clustering result (will default to the algorithm name) (default=NULL will try to obtain the name from the community detection method, or will use 'community' as a default)

`test.stability`

boolean Whether to test stability of community detection (default=FALSE)

`stability.subsampling.fraction`

numeric Fraction of clusters to subset (default=0.95). Must be within range [0, 1].

`stability.subsamples`

integer Number of subsampling iterations (default=100)

`verbose`

boolean Whether to provide verbose output (default=TRUE)

`cls`

optional pre-calculated community result (may be useful for stability testing) (default: NULL)

`sr`

optional pre-calculated subsampled community results (useful for stability testing) (default: NULL)

`...`

extra parameters are passed to the specified community detection method

invisible list containing identified communities (groups) and the full community detection result (result); The results are stored in $clusters$name slot in the conos object. Each such slot contains an object with elements: $results which stores the raw output of the community detection method, and $groups which is a factor on cells describing the resulting clustering. The later can be used, for instance, in plotting: con$plotGraph(groups=con$clusters$leiden$groups). If test.stability==TRUE, then the result object will also contain a $stability slot.

con <- Conos$new(small_panel.preprocessed, n.cores=1) con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN', metric='angular', score.component.variance=TRUE, verbose=TRUE) con$findCommunities(method = igraph::walktrap.community, steps=5)

`plotPanel()`

Plot panel of individual embeddings per sample with joint coloring

Conos$plotPanel( clustering = NULL, groups = NULL, colors = NULL, gene = NULL, use.local.clusters = FALSE, plot.theme = NULL, use.common.embedding = FALSE, embedding = NULL, adj.list = NULL, ... )

`clustering`

character Name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default=NULL - if groups are not specified, the first clustering will be used)

`groups`

a cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default=NULL, see clustering)

`colors`

a color factor (named with cell names) use for cell coloring

`gene`

show expression of a gene

`use.local.clusters`

boolean Whether clusters should be taken from the individual samples; otherwise joint clusters in the conos object will be used (see clustering) (default=FALSE).

`plot.theme`

string Theme for the plot, passed to plotSamples() (default=NULL)

`use.common.embedding`

boolean Whether a joint embedding in the conos object should be used (or embeddings determined for the individual samples) (default=FALSE)

`embedding`

(default=NULL) If a character value is passed, it is interpreted as an embedding name (a name of a joint embedding in conos when use.commmon.embedding=TRUE, or a name of an embedding within the individual objects when use.common.embedding=FALSE). If a matrix is passed, it is interpreted as an actual embedding (then first two columns are interpreted as x/y coordinates, row names must be cell names). If NULL, the default embedding will be used.

`adj.list`

an optional list of additional ggplot2 directions to apply (default=NULL)

`...`

Additional parameters passed to plotSamples(), plotEmbeddings(), sccore::embeddingPlot().

cowplot grid object with the panel of plots

`embedGraph()`

Generate an embedding of a joint graph

Conos$embedGraph( method = "largeVis", embedding.name = method, M = 1, gamma = 1, alpha = 0.1, perplexity = NA, sgd_batches = 1e+08, seed = 1, verbose = TRUE, target.dims = 2, ... )

`method`

Embedding method (default='largeVis'). Currently 'largeVis' and 'UMAP' are supported.

`embedding.name`

character Optional name of the name of the embedding set by user to store multiple embeddings (default: method name)

`M`

numeric (largeVis) The number of negative edges to sample for each positive edge to be used (default=1)

`gamma`

numeric (largeVis) The strength of the force pushing non-neighbor nodes apart (default=1)

`alpha`

numeric (largeVis) Hyperparameter used in the default distance function,

*1 / (1 + α \dot ||y_i - y_j||^2)*(default=0.1). The function relates the distance between points in the low-dimensional projection to the likelihood that the two points are nearest neighbors. Increasing*α*tends to push nodes and their neighbors closer together; decreasing*α*produces a broader distribution. Setting*α*to zero enables the alternative distance function.*α*below zero is meaningless.`perplexity`

(largeVis) The perplexity passed to largeVis (default=NA)

`sgd_batches`

(largeVis) The number of edges to process during SGD (default=1e8). Defaults to a value set based on the size of the dataset. If the parameter given is between

`0`

and`1`

, the default value will be multiplied by the parameter.`seed`

numeric Random seed for the largeVis algorithm (default=1)

`verbose`

boolean Whether to provide verbose output (default=TRUE)

`target.dims`

numeric Number of dimensions for the reduction (default=2). Higher dimensions can be used to generate embeddings for subsequent reductions by other methods, such as tSNE

`...`

additional arguments, passed to UMAP embedding (run ?conos:::embedGraphUmap for more info)

`plotClusterStability()`

Plot cluster stability statistics.

Conos$plotClusterStability(clustering = NULL, what = "all")

`clustering`

string Name of the clustering result to show (default=NULL)

`what`

string Show a specific plot (ari - adjusted rand index, fjc - flat Jaccard, hjc - hierarchical Jaccard, dend - cluster dendrogram, all - everything except 'dend') (default='all')

cluster stability statistics

`plotGraph()`

Plot joint graph

Conos$plotGraph( color.by = "cluster", clustering = NULL, embedding = NULL, groups = NULL, colors = NULL, gene = NULL, plot.theme = NULL, subset = NULL, ... )

`color.by`

character A shortcut to color the plot by 'cluster' or by 'sample' (default: 'cluster'). If any other string is input, an error is thrown.

`clustering`

a character name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default: NULL - if groups are not specified, the first clustering will be used)

`embedding`

A character name of an embedding, or a matrix of the actual embedding (rownames should correspond to cells, first to columns to x/y coordinates). If NULL (default: NULL), the latest generated embedding will be used

`groups`

a cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default: NULL, see clustering)

`colors`

a color factor (named with cell names) use for cell coloring (default=NULL)

`gene`

Show expression of a gene (default=NULL)

`plot.theme`

Theme for the plot, passed to sccore::embeddingPlot() (default=NULL)

`subset`

A subset of cells to show (default: NULL - shows all the cells)

`...`

Additional parameters passed to sccore::embeddingPlot()

ggplot2 plot of joint graph

`correctGenes()`

Smooth expression of genes to minimize the batch effect between samples Use diffusion of expression on graph with the equation dv = exp(-a * (v + b))

Conos$correctGenes( genes = NULL, n.od.genes = 500, fading = 10, fading.const = 0.5, max.iters = 15, tol = 0.005, name = "diffusion", verbose = TRUE, count.matrix = NULL, normalize = TRUE )

`genes`

List of genes to be smooothed smoothing (default=NULL will smooth top n.od.genes overdispersed genes)

`n.od.genes`

numeric If 'genes' is NULL, top n.od.genes of overdispersed genes are taken across all samples (default=500)

`fading`

numeric Level of fading of expression change from distance on the graph (parameter 'a' of the equation) (default=10)

`fading.const`

numeric Minimal penalty for each new edge during diffusion (parameter 'b' of the equation) (default=0.5)

`max.iters`

numeric Maximal number of diffusion iterations (default=15)

`tol`

numeric Tolerance after which the diffusion stops (default=5e-3)

`name`

string Name to save the correction (default='diffusion')

`verbose`

boolean Verbose mode (default=TRUE)

`count.matrix`

Alternative gene count matrix to correct (rows: genes, columns: cells; has to be dense matrix). Default: joint count matrix for all datasets.

`normalize`

boolean Whether to normalize values (default=TRUE)

smoothed expression of the input genes

`propagateLabels()`

Estimate labeling distribution for each vertex, based on a partial labeling of the cells. There are two methods used for the propagation to calculate the distribution of labels: "solver" and "diffusion". * "diffusion" (default) will estimate the labeling distribution for each vertex, based on provided labels using a random walk. * "solver" will propagate labels using the algorithm described by Zhu, Ghahramani, Lafferty (2003) <http://mlg.eng.cam.ac.uk/zoubin/papers/zgl.pdf> Confidence values are then calculated by taking the maximum value from this distribution of labels, for each cell.

Conos$propagateLabels(labels, method = "diffusion", ...)

`labels`

Input labels

`method`

type of propagation. Either 'diffusion' or 'solver'. 'solver' gives better result but has bad asymptotics, so is inappropriate for datasets > 20k cells. (default='diffusion')

`...`

additional arguments for conos:::propagateLabels* functions

list with three fields: * labels = matrix with distribution of label probabilities for each vertex by rows. * uncertainty = 1 - confidence values * label.distribution = the distribution of labels calculated using either the methods "diffusion" or "solver"

`getClusterCountMatrices()`

Calculate pseudo-bulk expression matrices for clusters (by adding up, for each gene, all of the molecules detected for all cells in a given cluster in a given sample)

Conos$getClusterCountMatrices( clustering = NULL, groups = NULL, common.genes = TRUE, omit.na.cells = TRUE )

`clustering`

string Name of the clustering to use

`groups`

a factor on cells to use for coloring

`common.genes`

boolean Whether to bring individual sample matrices to a common gene list (default=TRUE)

`omit.na.cells`

boolean If set to FALSE, the resulting matrices will include a first column named 'NA' that will report total molecule counts for all of the cells that were not covered by the provided factor. (default=TRUE)

a list of per-sample uniform dense matrices with rows being genes, and columns being clusters

`getDatasetPerCell()`

applies 'getCellNames()' on all samples

Conos$getDatasetPerCell()

list of cellnames for all samples

con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getDatasetPerCell()

`getJointCountMatrix()`

Retrieve joint count matrices

Conos$getJointCountMatrix(raw = FALSE)

`raw`

boolean If TRUE, return merged "raw" count matrices, using function getRawCountMatrix(). Otherwise, return the merged count matrices, using getCountMatrix(). (default=FALSE)

list of merged count matrices

con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getJointCountMatrix()

`clone()`

The objects of this class are cloneable with this method.

Conos$clone(deep = FALSE)

`deep`

Whether to make a deep clone.

## ------------------------------------------------ ## Method `Conos$new` ## ------------------------------------------------ con <- Conos$new(small_panel.preprocessed, n.cores=1) ## ------------------------------------------------ ## Method `Conos$buildGraph` ## ------------------------------------------------ con <- Conos$new(small_panel.preprocessed, n.cores=1) con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN', metric='angular', score.component.variance=TRUE, verbose=TRUE) ## ------------------------------------------------ ## Method `Conos$findCommunities` ## ------------------------------------------------ con <- Conos$new(small_panel.preprocessed, n.cores=1) con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN', metric='angular', score.component.variance=TRUE, verbose=TRUE) con$findCommunities(method = igraph::walktrap.community, steps=5) ## ------------------------------------------------ ## Method `Conos$getDatasetPerCell` ## ------------------------------------------------ con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getDatasetPerCell() ## ------------------------------------------------ ## Method `Conos$getJointCountMatrix` ## ------------------------------------------------ con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getJointCountMatrix()

[Package *conos* version 1.4.2 Index]