clustering_angular_distance {MitoHEAR}R Documentation

clustering_angular_distance

Description

For each pair of samples and for each base, an angular distance matrix is computed based on the four allele frequencies. Then only the angular distances corresponding to the relevant_bases are kept. If relevant bases is NULL, then only the angular distances corresponding to the bases with relative distance variance among samples above min_value are kept . Finally the distance between each pair of samples is defined as the euclidean distance of the angular distances corresponding to the bases that pass the previous filtering step. On this final distance matrix, a hierarchical clustering approach is performed using the function cutreeHybrid of the package dynamicTreeCut.

Usage

clustering_angular_distance(
  heteroplasmy_matrix,
  allele_matrix,
  cluster,
  top_pos,
  deepSplit_param,
  minClusterSize_param,
  threshold = 0.2,
  min_value,
  index,
  relevant_bases = NULL,
  max_frac = 0.7
)

Arguments

heteroplasmy_matrix

Third element returned by get_heteroplasmy.

allele_matrix

Fourth element returned by get_heteroplasmy.

cluster

Vector specifying a partition of the samples.

top_pos

Numeric value. Number of bases sorted with decreasing values of distance variance (see section Details below) among samples. If relevant_bases=NULL, then the bases for performing hierarchical clustering are the ones whose relative variance (variance of the base divided sum of variance among top_pos bases) is above min_value.

deepSplit_param

Integer value between 0 and 4 for the deepSplit parameter of the function cutreeHybrid. See section Details below.

minClusterSize_param

Integer value specifying the minClusterSize parameter of the function cutreeHybrid. See section Details below.

threshold

Numeric value. If a base has heteroplasmy greater or equal to threshold in more than max_frac of cells, then the base is not considered for down stream analysis.

min_value

Numeric value. If relevant_bases=NULL, then the bases for performing hierarchical clustering are the ones whose relative variance (variance of the base divided sum of variance among top_pos bases) is above min_value.

index

Fifth element returned by get_heteroplasmy.

relevant_bases

Character vector of bases to consider as features for performing hierarchical clustering on samples.Default=NULL.

max_frac

Numeric value.If a base has heteroplasmy greater or equal to threshold in more than max_frac of cells, then the base is not considered for down stream analysis.

Value

It returns a list with 4 elements:

classification

Dataframe with two columns and n_row equal to n_row in heteroplasmy_matrix. The first column is the old cluster annotation provided by cluster. The second columns is the new cluster annotation obtained with hierarchical clustering on distance matrix based on heteroplasmy values.

dist_ang_matrix

Distance matrix based on heteroplasmy values as defined in the section Details

top_bases_dist

Vector of bases used for hierarchical clustering. If relevant_bases is not NULL, then top_bases_dist=NULL

common_idx

Vector of indices of samples for which hierarchical clustering is performed. If index is NULL, then common_idx=NULL

Author(s)

Gabriele Lubatti gabriele.lubatti@helmholtz-muenchen.de

See Also

https://www.rdocumentation.org/packages/dynamicTreeCut/versions/1.63-1/topics/cutreeHybrid


[Package MitoHEAR version 0.1.0 Index]