adjClust {adjclust}R Documentation

Adjacency-constrained Clustering

Description

Adjacency-constrained hierarchical agglomerative clustering

Usage

adjClust(mat, type = c("similarity", "dissimilarity"), h = ncol(mat) - 1)

Arguments

mat

A similarity matrix or a dist object. Most sparse formats from sparseMatrix are allowed

type

Type of matrix : similarity or dissimilarity. Defaults to "similarity"

h

band width. It is assumed that the similarity between two items is 0 when these items are at a distance of more than band width h. Default value is ncol(mat)-1

Details

Adjacency-constrained hierarchical agglomerative clustering (HAC) is HAC in which each observation is associated to a position, and the clustering is constrained so as only adjacent clusters are merged. These methods are useful in various application fields, including ecology (Quaternary data) and bioinformatics (e.g., in Genome-Wide Association Studies (GWAS)).

This function is a fast implementation of the method that takes advantage of sparse similarity matrices (i.e., that have 0 entries outside of a diagonal band of width h). The method is fully described in (Dehman, 2015) and based on a kernel version of the algorithm. The different options for the implementation are available in the package vignette entitled "Notes on CHAC implementation in adjclust".

Value

An object of class chac which describes the tree produced by the clustering process. The object a list with the same elements as an object of class chac (merge, height, order, labels, call, method, dist.method), and an extra element mat: the data on which the clustering is performed, possibly after pre-transformations described in the vignette entitled "Notes on CHAC implementation in adjclust".

References

Dehman A. (2015) Spatial Clustering of Linkage Disequilibrium Blocks for Genome-Wide Association Studies, PhD thesis, Universite Paris Saclay.

Ambroise C., Dehman A., Neuvial P., Rigaill G., and Vialaneix N (2019). Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics, Algorithms for Molecular Biology 14(22)"

See Also

snpClust to cluster SNPs based on linkage disequilibrium

hicClust to cluster Hi-C data

Examples

sim <- matrix(
c(1.0, 0.1, 0.2, 0.3,
  0.1, 1.0 ,0.4 ,0.5,
  0.2, 0.4, 1.0, 0.6,
  0.3, 0.5, 0.6, 1.0), nrow = 4)

## similarity, full width
fit1 <- adjClust(sim, "similarity")
plot(fit1)

## similarity, h < p-1
fit2 <- adjClust(sim, "similarity", h = 2)
plot(fit2)

## dissimilarity
dist <- as.dist(sqrt(2-(2*sim)))

## dissimilarity, full width
fit3 <- adjClust(dist, "dissimilarity")
plot(fit3)

## dissimilarity, h < p-1
fit4 <- adjClust(dist, "dissimilarity", h = 2)
plot(fit4)


[Package adjclust version 0.5.99 Index]