get_nn_importance {ClustAssess} | R Documentation |
Assess Graph Building Parameters
Description
Evaluates clustering stability when changing the values of different parameters involved in the graph building step, namely the base embedding, the graph type and the number of neighbours.
Usage
get_nn_importance(
object,
n_neigh_sequence,
n_repetitions = 100,
seed_sequence = NULL,
graph_reduction_type = "PCA",
ecs_thresh = 1,
ncores = 1,
transpose = (graph_reduction_type == "PCA"),
graph_type = 2,
algorithm = 4,
...
)
Arguments
object |
The data matrix. If the graph reduction type is PCA, the object should be an expression matrix, with features on rows and observations on columns; in the case of UMAP, the user could also provide a matrix associated to a PCA embedding. See also the transpose argument. |
n_neigh_sequence |
A sequence of the number of nearest neighbours. |
n_repetitions |
The number of repetitions of applying the pipeline with different seeds; ignored if seed_sequence is provided by the user. |
seed_sequence |
A custom seed sequence; if the value is NULL, the sequence will be built starting from 1 with a step of 100. |
graph_reduction_type |
The graph reduction type, denoting if the graph should be built on either the PCA or the UMAP embedding. |
ecs_thresh |
The ECS threshold used for merging similar clusterings. |
ncores |
The number of parallel R instances that will run the code. If the value is set to 1, the code will be run sequentially. |
transpose |
Logical: whether the input object will be transposed or not. Set to FALSE if the input is an observations X features matrix, and set to TRUE if the input is a features X observations matrix. |
graph_type |
Argument indicating whether the graph should be unweighted (0), weighted (1) or both (2). |
algorithm |
An index indicating which community detection algorithm will
be used: Louvain (1), Louvain refined (2), SLM (3) or Leiden (4). More details
can be found in the Seurat's |
... |
Additional arguments passed to the |
Value
A list having three fields:
n_neigh_k_corresp - list containing the number of the clusters obtained by running the pipeline multiple times with different seed, number of neighbors and graph type (weighted vs unweigted)
n_neigh_ec_consistency - list containing the EC consistency of the partitions obtained at multiple runs when changing the number of neighbors or the graph type
n_different_partitions - the number of different partitions obtained by each number of neighbors
Examples
set.seed(2021)
# create an artificial expression matrix
expr_matrix = matrix(c(runif(100*10), runif(100*10, min=5, max=6)), nrow = 200)
rownames(expr_matrix) = as.character(1:200)
nn_importance_obj = get_nn_importance(object = expr_matrix,
n_neigh_sequence = c(10,15,20),
n_repetitions = 10,
graph_reduction_type = "PCA",
algorithm = 1,
transpose = FALSE, # the matrix is already observations x features, so we won't transpose it
# the following parameter is used by the irlba function and is not mandatory
nv = 2)
plot_n_neigh_ecs(nn_importance_obj)