Adjacency-constrained Clustering of Single Nucleotide Polymorphisms


Adjacency-constrained hierarchical agglomerative clustering of Single Nucleotide Polymorphisms based on Linkage Disequilibrium


snpClust(x, h = ncol(x) - 1, stats = c("R.squared", ""))



either a genotype matrix of class SnpMatrix/matrix or a linkage disequilibrium matrix of class dgCMatrix. In the latter case the LD values are expected to be in [0,1]


band width. If not provided, h is set to default value 'p-1' where 'p' is the number of columns of x


a character vector specifying the linkage disequilibrium measures to be calculated (using the ld function) when x is a genotype matrix. Only "R.squared" and "" are allowed, see Details.


Adjacency-constrained hierarchical agglomerative clustering (HAC) is HAC in which each observation is associated to a position, and the clustering is constrained so as only adjacent clusters are merged. SNPs are clustered based on their similarity as measured by the linkage disequilibrium.

In the special case where genotypes are given as input and the corresponding LD matrix has missing entries, the clustering cannot be performed. This can typically happen when there is insufficient variability in the sample genotypes. In this special case, the indices of the SNP pairs which yield missing values are returned.

If x is of class SnpMatrix or matrix, it is assumed to be a n \times p matrix of p genotypes for n individuals. This input is converted to a LD similarity matrix using the snpStats::ld. If x is of class dgCMatrix, it is assumed to be a (squared) LD matrix.

Clustering on a LD similarity other than "R.squared" or "" can be performed by providing the LD values directly as argument x. These values are expected to be in [0,1], otherwise they are truncated to [0,1].


An object of class chac (when no LD value is missing)


## a very small example
if (requireNamespace("snpStats", quietly = TRUE)) {
  data(testdata, package = "snpStats")

  # input as snpStats::SnpMatrix
  fit1 <- snpClust(Autosomes[1:200, 1:5], h = 3, stats = "R.squared")

  # input as base::matrix
  fit2 <- snpClust(as.matrix(Autosomes[1:200, 1:5]), h = 3, stats = "R.squared")

  # input as Matrix::dgCMatrix
  ldres <- snpStats::ld(Autosomes[1:200, 1:5], depth = 3, stats = "R.squared", symmetric = TRUE)
  fit3 <- snpClust(ldres, 3)

