R: clustering_angular

clustering_angular_distance {MitoHEAR}

R Documentation

clustering_angular_distance

Description

For each pair of samples and for each base, an angular distance matrix is computed based on the four allele frequencies. Then only the angular distances corresponding to the relevant_bases are kept. If relevant bases is NULL, then only the angular distances corresponding to the bases with relative distance variance among samples above min_value are kept . Finally the distance between each pair of samples is defined as the euclidean distance of the angular distances corresponding to the bases that pass the previous filtering step. On this final distance matrix, a hierarchical clustering approach is performed using the function cutreeHybrid of the package dynamicTreeCut.

Usage

clustering_angular_distance(
  heteroplasmy_matrix,
  allele_matrix,
  cluster,
  top_pos,
  deepSplit_param,
  minClusterSize_param,
  threshold = 0.2,
  min_value,
  index,
  relevant_bases = NULL,
  max_frac = 0.7
)

Arguments

`heteroplasmy_matrix`	Third element returned by get_heteroplasmy.
`allele_matrix`	Fourth element returned by get_heteroplasmy.
`cluster`	Vector specifying a partition of the samples.
`top_pos`	Numeric value. Number of bases sorted with decreasing values of distance variance (see section Details below) among samples. If relevant_bases=NULL, then the bases for performing hierarchical clustering are the ones whose relative variance (variance of the base divided sum of variance among top_pos bases) is above min_value.
`deepSplit_param`	Integer value between 0 and 4 for the deepSplit parameter of the function cutreeHybrid. See section Details below.
`minClusterSize_param`	Integer value specifying the minClusterSize parameter of the function cutreeHybrid. See section Details below.
`threshold`	Numeric value. If a base has heteroplasmy greater or equal to threshold in more than max_frac of cells, then the base is not considered for down stream analysis.
`min_value`	Numeric value. If relevant_bases=NULL, then the bases for performing hierarchical clustering are the ones whose relative variance (variance of the base divided sum of variance among top_pos bases) is above min_value.
`index`	Fifth element returned by get_heteroplasmy.
`relevant_bases`	Character vector of bases to consider as features for performing hierarchical clustering on samples.Default=NULL.
`max_frac`	Numeric value.If a base has heteroplasmy greater or equal to threshold in more than max_frac of cells, then the base is not considered for down stream analysis.

Value

It returns a list with 4 elements:

`classification`	Dataframe with two columns and n_row equal to n_row in heteroplasmy_matrix. The first column is the old cluster annotation provided by cluster. The second columns is the new cluster annotation obtained with hierarchical clustering on distance matrix based on heteroplasmy values.
`dist_ang_matrix`	Distance matrix based on heteroplasmy values as defined in the section Details
`top_bases_dist`	Vector of bases used for hierarchical clustering. If relevant_bases is not NULL, then top_bases_dist=NULL
`common_idx`	Vector of indices of samples for which hierarchical clustering is performed. If index is NULL, then common_idx=NULL

Author(s)