Conos {conos} R Documentation

Conos R6 class

Description

The class encompasses sample collections, providing methods for calculating and visualizing joint graph and communities.

Public fields

samples

list of samples (Pagoda2 or Seurat objects)

pairs

pairwise alignment results

graph

alignment graph

clusters

list of clustering results named by clustering type

expression.adj

embeddings

list of joint embeddings

embedding

joint embedding

n.cores

number of cores

misc

Methods

Method new()

initialize Conos class



Method addSamples()

Initialize or add a set of samples to the conos panel. Note: this will simply add samples, but will not update graph, clustering, etc.

Conos$addSamples(x, replace = FALSE, verbose = FALSE) Arguments x a named list of pagoda2 or Seurat objects (one per sample) replace boolean Whether the existing samples should be purged before adding new ones (default=FALSE) verbose boolean Whether to provide verbose output (default=FALSE) Returns invisible view of the full sample list Method buildGraph() Build the joint graph that encompasses all the samples, establishing weighted inter-sample cell-to-cell links Usage Conos$buildGraph(
k = 15,
k.self = 10,
k.self.weight = 0.1,
alignment.strength = NULL,
space = "PCA",
matching.method = "mNN",
metric = "angular",
k1 = k,
data.type = "counts",
l2.sigma = 1e+05,
var.scale = TRUE,
ncomps = 40,
n.odgenes = 2000,
exclude.samples = NULL,
common.centering = TRUE,
verbose = TRUE,
base.groups = NULL,
append.global.axes = TRUE,
append.decoys = TRUE,
decoy.threshold = 1,
n.decoys = k * 2,
score.component.variance = FALSE,
snn = FALSE,
snn.quantile = 0.9,
min.snn.jaccard = 0,
min.snn.weight = 0,
snn.k.self = k.self,
balance.edge.weights = FALSE,
balancing.factor.per.cell = NULL,
same.factor.downweight = 1,
k.same.factor = k,
balancing.factor.per.sample = NULL
)
Arguments
k

integer integer Size of the inter-sample neighborhood (default=15)

k.self

integer Size of the with-sample neighborhoods (default=10).

k.self.weight

numeric Weight multiplier on the intra-sample edges relative to inter-sample edges (default=0.1)

alignment.strength

numeric Alignment strength (default=NULL will result in alignment.strength=0)

space

character Reduced expression space used to establish putative alignments between pairs of samples (default='PCA'). Currently supported spaces are: — "CPCA" Common principal component analysis — "JNMF" Joint NMF — "genes" Gene expression space (log2 transformed) — "PCA" Principal component analysis — "CCA" Canonical correlation analysis — "PMA" (Penalized Multivariate Analysis <https://cran.r-project.org/web/packages/PMA/index.html>)

matching.method

character Matching method (default='mNN'). Currently supported methods are "NN" (nearest neighbors) or "mNN" (mututal nearest neighbors).

metric

character Distance metric to measure similarity (default='angular'). Currenlty supported metrics are "angular" and "L2".

k1

numeric Neighborhood radius for identifying mutually-matching neighbors (default=k). Note that k1 must be greater than or equal to k, i.e. k1>=k. Increasing k1 beyond k will lead to more aggressive alignment of distinct subpopulations (i.e. increased alignment strengths).

data.type

character Type of data type in the input pagoda2 objects within r.n (default='counts').

l2.sigma

numeric L2 distances get transformed as exp(-d/sigma) using this value (default=1e5)

var.scale

boolean Whether to use common variance scaling (default=TRUE). If TRUE, use geometric means for variance, as we're trying to focus on the common variance components. See scaledMatricesP2() code.

ncomps

integer Number of components (default=40)

n.odgenes

integer Number of overdispersed genes to be used in each pairwise alignment (default=2000)

matching.mask

an optional matrix explicitly specifying which pairs of samples should be compared (a symmetrical matrix of logical values with row and column names corresponding to sample names). (default=NULL). By default, comparisons between all paris are allowed. The argument can be used to exclude comparisons across certain pairs of samples (e.g. techincal replicates, which are expected to show very high similarity).

exclude.samples

optional list of sample names that should be excluded from the alignment and the resulting graph (default=NULL)

common.centering

boolean When calculating reduced expression space for a given sample pair, whether the expression of genes should be centered using the mean from both samples (TRUE) or using the mean within each sample (FALSE) (default=TRUE)

verbose

boolean Whether to provide verbose output (default=TRUE)

base.groups

an optional factor on cells specifying previously-obtained cell grouping to be used for adjusting the sample alignment (default: NULL). Specifically, cell clusters specfiieid by the base.groups can be used to i) calculate global expression axes which are appended to the overall set of eigenvectors, ii) adding decoy cells.

append.global.axes

boolean Whether to project samples on global expression axes, as defined by pre-defined (typically crude) set of cell subpopulations as specified by the base.gruops parameter (default=TRUE, but works only if base.groups is specified)

append.decoys

boolean Whether to use pre-defined cell groups (specified by base.groups) to append decoy cells to the samples which are otherwise lacking any of the pre-specified cell groups (default=TRUE, but works only if base.groups is specified). The decoy cells can reduce the number of erroneous matches in highly heterogeneous sample collections, where some of the samples lack entire cell subpopulations which are found in other samples. The approach only works if the base.groups (typically a crude clustering of top-level cell types) can be established with a reasonable confidence.

decoy.threshold

integer Minimal number of cells of a given cell type that should exist in a given sample (according to base.groups) to avoid addition of decoy cells to that sample for the purposes of alignment (default=1)

n.decoys

integer Number of decoy cells that should be added to a sample that had less than decoy.threshold cells of a given cell type (default=k*2)

score.component.variance

boolean Whether to score the amount of total variance explained by different components (default=FALSE as it takes extra time to calculate)

snn

boolean Whether to transform the joint graph by computing a shared nearest neighborhood graph (analogous to Seurat 3), further weighting the edges between two matched cells based on the similarity (measured by Jaccard coefficient) of all of their predicted neighbors (across all of the samples) (default: FALSE)

snn.quantile

numeric Specifies how the shared neighborhood graph transformation will determine final edge weights. If snn.quantile=NULL, the edge weight will be simply equal to the Jaccard coefficient of the neighborhoods. If snn.quantile is a vector of two numeric values (p1, p2), they will be treated as quantile probabilities, and quantile values (q1,q2) on the set of all Jaccard coefficients (for all edges) will be determiend. The edge weights will then be reset, so that edges with Jaccard coefficients below or equal to q1 will be set to 0, and those with coefficients >=q2 will be set to 1. The rest of the weights will be mapped uniformly from [q1,q2]->[0,1] range. If a single numeric value is supplied, it will be treated as a symmetric quantile probability (i.e. snn.quantile=0.8 is equivalent to specifying snn.quantile=c(1-0.8,0.8)). (default: 0.9)

min.snn.jaccard

numeric Minimum Jaccard coefficient required for a shared neighborhood graph edge (default: 0). The edges with Jaccard coefficients below this threshold will be removed (i.e. weight set to 0)

min.snn.weight

numeric Shared nearest neighbor procedure will adjust the weights of the edges, and even eliminate some of the edges (by setting their weight to zero). The min.snn.weight parameter allows to set a minimal adjusted edge weight, so that the edge weight is never reduced beyond this level (and hence never deleted) (default: 0 - no adjustments)

snn.k.self

integer Size of the within-sample neighorhood to be used in shared nearest neighbor calculations (default=k.self)

balance.edge.weights

boolean Whether to balance edge weights to control for a cell- or sample- specific factor (default=FALSE)

balancing.factor.per.cell

A per-cell factor (discrete factor, named with cell names) specifying a design difference should be controlled for by adjusting edge weights in the joint graph (default=NULL)

same.factor.downweight

numeric Optional weighting factor for edges connecting cells with the same cell factor level per cell balancing (default=1.0)

k.same.factor

integer An neighborhood size that should be used when aligning samples of the same balancing.factor.per.sample level. Setting a value smaller than k will lead to reduction of alingment strenth within the sample batches (default=k)

balancing.factor.per.sample

A covariate factor per sample that should be controlled for by adjusting edge weights in the joint graph (default=NULL)

Returns

joint graph to be used for downstream analysis

Examples
con <- Conos$new(small_panel.preprocessed, n.cores=1) con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN',
metric='angular', score.component.variance=TRUE, verbose=TRUE)



Method getDifferentialGenes()

Calculate genes differentially expressed between cell clusters. Estimates base mean, z-score, p-values, specificity, precision, expressionFraction, AUC (if append.auc=TRUE)

Conos$getDifferentialGenes( clustering = NULL, groups = NULL, z.threshold = 3, upregulated.only = FALSE, verbose = TRUE, append.specificity.metrics = TRUE, append.auc = TRUE ) Arguments clustering character Name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default: NULL - if groups are not specified, the first clustering will be used)

groups

a cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default: NULL, see clustering)

z.threshold

numeric Minimum absolute value of a Z score for which the genes should be reported (default=3.0).

upregulated.only

boolean If TRUE, will report only genes significantly upregulated in each cluster; otherwise both up- and down-regulated genes will be reported (default=FALSE)

verbose

boolean Whether to provide verbose output (default=TRUE)

append.specificity.metrics

boolean Whether to append specificity metrics (default=TRUE)

append.auc

boolean Whether to append AUC scores (default=TRUE)

Returns

list of DE results; each is a data frame with rows corresponding to the differentially expressed genes, and columns listing log2 fold change (M), signed Z scores (both raw and adjusted for mulitple hypothesis using BH correction), optional specificty/sensitivity and AUC metrics.

Method findCommunities()

Find cell clusters (as communities on the joint graph)

Examples
con <- Conos$new(small_panel.preprocessed, n.cores=1) con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN',
metric='angular', score.component.variance=TRUE, verbose=TRUE)
con$findCommunities(method = igraph::walktrap.community, steps=5)  Method plotPanel() Plot panel of individual embeddings per sample with joint coloring Usage Conos$plotPanel(
clustering = NULL,
groups = NULL,
colors = NULL,
gene = NULL,
use.local.clusters = FALSE,
plot.theme = NULL,
use.common.embedding = FALSE,
embedding = NULL,
...
)
Arguments
clustering

character Name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default=NULL - if groups are not specified, the first clustering will be used) groups a cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default=NULL, see clustering) colors a color factor (named with cell names) use for cell coloring gene show expression of a gene use.local.clusters boolean Whether clusters should be taken from the individual samples; otherwise joint clusters in the conos object will be used (see clustering) (default=FALSE). plot.theme string Theme for the plot, passed to plotSamples() (default=NULL) use.common.embedding boolean Whether a joint embedding in the conos object should be used (or embeddings determined for the individual samples) (default=FALSE) embedding (default=NULL) If a character value is passed, it is interpreted as an embedding name (a name of a joint embedding in conos when use.commmon.embedding=TRUE, or a name of an embedding within the individual objects when use.common.embedding=FALSE). If a matrix is passed, it is interpreted as an actual embedding (then first two columns are interpreted as x/y coordinates, row names must be cell names). If NULL, the default embedding will be used. adj.list an optional list of additional ggplot2 directions to apply (default=NULL) ... Additional parameters passed to plotSamples(), plotEmbeddings(), sccore::embeddingPlot(). Returns cowplot grid object with the panel of plots Method embedGraph() Generate an embedding of a joint graph Usage Conos$embedGraph(
method = "largeVis",
embedding.name = method,
M = 1,
gamma = 1,
alpha = 0.1,
perplexity = NA,
sgd_batches = 1e+08,
seed = 1,
verbose = TRUE,
target.dims = 2,
...
)
Arguments
method

Embedding method (default='largeVis'). Currently 'largeVis' and 'UMAP' are supported.

embedding.name

character Optional name of the name of the embedding set by user to store multiple embeddings (default: method name)

M

numeric (largeVis) The number of negative edges to sample for each positive edge to be used (default=1)

gamma

numeric (largeVis) The strength of the force pushing non-neighbor nodes apart (default=1)

alpha

numeric (largeVis) Hyperparameter used in the default distance function, 1 / (1 + α \dot ||y_i - y_j||^2) (default=0.1). The function relates the distance between points in the low-dimensional projection to the likelihood that the two points are nearest neighbors. Increasing α tends to push nodes and their neighbors closer together; decreasing α produces a broader distribution. Setting α to zero enables the alternative distance function. α below zero is meaningless.

perplexity

(largeVis) The perplexity passed to largeVis (default=NA)

sgd_batches

(largeVis) The number of edges to process during SGD (default=1e8). Defaults to a value set based on the size of the dataset. If the parameter given is between 0 and 1, the default value will be multiplied by the parameter.

seed

numeric Random seed for the largeVis algorithm (default=1)

verbose

boolean Whether to provide verbose output (default=TRUE)

target.dims

numeric Number of dimensions for the reduction (default=2). Higher dimensions can be used to generate embeddings for subsequent reductions by other methods, such as tSNE

...

Method plotClusterStability()

Plot cluster stability statistics.

Conos$plotClusterStability(clustering = NULL, what = "all") Arguments clustering string Name of the clustering result to show (default=NULL) what string Show a specific plot (ari - adjusted rand index, fjc - flat Jaccard, hjc - hierarchical Jaccard, dend - cluster dendrogram, all - everything except 'dend') (default='all') Returns cluster stability statistics Method plotGraph() Plot joint graph Usage Conos$plotGraph(
color.by = "cluster",
clustering = NULL,
embedding = NULL,
groups = NULL,
colors = NULL,
gene = NULL,
plot.theme = NULL,
subset = NULL,
...
)
Arguments
color.by

character A shortcut to color the plot by 'cluster' or by 'sample' (default: 'cluster'). If any other string is input, an error is thrown.

clustering

a character name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default: NULL - if groups are not specified, the first clustering will be used) embedding A character name of an embedding, or a matrix of the actual embedding (rownames should correspond to cells, first to columns to x/y coordinates). If NULL (default: NULL), the latest generated embedding will be used groups a cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default: NULL, see clustering) colors a color factor (named with cell names) use for cell coloring (default=NULL) gene Show expression of a gene (default=NULL) plot.theme Theme for the plot, passed to sccore::embeddingPlot() (default=NULL) subset A subset of cells to show (default: NULL - shows all the cells) ... Additional parameters passed to sccore::embeddingPlot() Returns ggplot2 plot of joint graph Method correctGenes() Smooth expression of genes to minimize the batch effect between samples Use diffusion of expression on graph with the equation dv = exp(-a * (v + b)) Usage Conos$correctGenes(
genes = NULL,
n.od.genes = 500,
max.iters = 15,
tol = 0.005,
name = "diffusion",
verbose = TRUE,
count.matrix = NULL,
normalize = TRUE
)
Arguments
genes

List of genes to be smooothed smoothing (default=NULL will smooth top n.od.genes overdispersed genes)

n.od.genes

numeric If 'genes' is NULL, top n.od.genes of overdispersed genes are taken across all samples (default=500)

fading

numeric Level of fading of expression change from distance on the graph (parameter 'a' of the equation) (default=10)

fading.const

numeric Minimal penalty for each new edge during diffusion (parameter 'b' of the equation) (default=0.5)

max.iters

numeric Maximal number of diffusion iterations (default=15)

tol

numeric Tolerance after which the diffusion stops (default=5e-3)

name

string Name to save the correction (default='diffusion')

verbose

boolean Verbose mode (default=TRUE)

count.matrix

Alternative gene count matrix to correct (rows: genes, columns: cells; has to be dense matrix). Default: joint count matrix for all datasets.

normalize

boolean Whether to normalize values (default=TRUE)

Returns

smoothed expression of the input genes

Method propagateLabels()

Estimate labeling distribution for each vertex, based on a partial labeling of the cells. There are two methods used for the propagation to calculate the distribution of labels: "solver" and "diffusion". * "diffusion" (default) will estimate the labeling distribution for each vertex, based on provided labels using a random walk. * "solver" will propagate labels using the algorithm described by Zhu, Ghahramani, Lafferty (2003) <http://mlg.eng.cam.ac.uk/zoubin/papers/zgl.pdf> Confidence values are then calculated by taking the maximum value from this distribution of labels, for each cell.

Conos$propagateLabels(labels, method = "diffusion", ...) Arguments labels Input labels method type of propagation. Either 'diffusion' or 'solver'. 'solver' gives better result but has bad asymptotics, so is inappropriate for datasets > 20k cells. (default='diffusion') ... additional arguments for conos:::propagateLabels* functions Returns list with three fields: * labels = matrix with distribution of label probabilities for each vertex by rows. * uncertainty = 1 - confidence values * label.distribution = the distribution of labels calculated using either the methods "diffusion" or "solver" Method getClusterCountMatrices() Calculate pseudo-bulk expression matrices for clusters (by adding up, for each gene, all of the molecules detected for all cells in a given cluster in a given sample) Usage Conos$getClusterCountMatrices(
clustering = NULL,
groups = NULL,
common.genes = TRUE,
omit.na.cells = TRUE
)
Arguments
clustering

string Name of the clustering to use

groups

a factor on cells to use for coloring

common.genes

boolean Whether to bring individual sample matrices to a common gene list (default=TRUE)

omit.na.cells

boolean If set to FALSE, the resulting matrices will include a first column named 'NA' that will report total molecule counts for all of the cells that were not covered by the provided factor. (default=TRUE)

Returns

a list of per-sample uniform dense matrices with rows being genes, and columns being clusters

Method getDatasetPerCell()

applies 'getCellNames()' on all samples

Arguments
raw

boolean If TRUE, return merged "raw" count matrices, using function getRawCountMatrix(). Otherwise, return the merged count matrices, using getCountMatrix(). (default=FALSE)

Returns

list of merged count matrices

Examples
con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getJointCountMatrix()



Method clone()

The objects of this class are cloneable with this method.

Conos$clone(deep = FALSE) Arguments deep Whether to make a deep clone. Examples  ## ------------------------------------------------ ## Method Conos$new
## ------------------------------------------------

con <- Conos$new(small_panel.preprocessed, n.cores=1) ## ------------------------------------------------ ## Method Conos$buildGraph
## ------------------------------------------------

con <- Conos$new(small_panel.preprocessed, n.cores=1) con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN',
metric='angular', score.component.variance=TRUE, verbose=TRUE)

## ------------------------------------------------
## Method Conos$findCommunities ## ------------------------------------------------ con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN', metric='angular', score.component.variance=TRUE, verbose=TRUE) con$findCommunities(method = igraph::walktrap.community, steps=5)

## ------------------------------------------------
## Method Conos$getDatasetPerCell ## ------------------------------------------------ con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$getDatasetPerCell() ## ------------------------------------------------ ## Method Conos$getJointCountMatrix
## ------------------------------------------------

con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getJointCountMatrix()



[Package conos version 1.4.2 Index]