bdm.optk.s2nr {bigMap}R Documentation

Find optimal number of clusters based on signal-to-noise-ratio.

Description

Performs a recursive merging of clusters based on minimum loss of signal-to-noise-ratio (S2NR). The S2NR is the explained/unexplained variance ratio measured in the high dimensional space based on the given low dimensional clustering. Merging is applied recursively until reaching a configuration of only 2 clusters and the S2NR is measured at each step.

Usage

bdm.optk.s2nr(bdm, info = T, plot.optk = T, ret.optk = F,
  layer = 1)

Arguments

bdm

A clustered bdm instance (i.e. all up-stream steps performed: bdm.ptse(), bdm.pakde() and bdm.wtt().

info

Logical value. If TRUE, all merging steps are shown (default value is info = FALSE).

plot.optk

Logical value. If TRUE, this function plots the heuristic measure versus the number of clusters (default value is plot.optk = TRUE)

ret.optk

Logical value. For large datasets this computation can take a while and it might be interesting to save it. If TRUE, the function returns a copy of the bdm instance with the values of S2NR attached as bdm$optk (default value is ret.optk = FALSE).

layer

The bdm$ptsne layer to be used (default value is layer = 1).

Details

The logic under this heuristic is that neigbouring clusters in the embedding correspond to close clusters in the high dimensional space, i.e. it is a merging heuristic based on the spatial distribution of clusters. For each cluster (child cluster) we choose the neighboring cluster with steepest gradient along their common border (father cluster). Thus, we get a set of pairs of clusters (child/father) as potential mergings. Given this set of candidates, the merging is performed recursively choosing, at each step, the pair of child/father clusters that results in a minimum loss of S2NR. A typical situation is that some clusters dominate over all of their neighboring clusters. This clusters have no father. Thus, once all candidate mergings have been performed we reach a blocked state where only the dominant clusters remain. This situation identifies a hierarchy level in the clustering. When this situation is reached, the algorithm starts a new merging round, identifying the child/father relations at that level of hierarchy. The process stops when only two clusters remain. Usually, the clustering hierarchy is clearly depicted by singular points in the S2NR function. This is a hint that the low dimensional clustering configuration is an image of a hierarchycal spatial configuration in the high dimensional space. See bdm.optk.plot().

Value

None if ret.optk = FALSE. Else, a copy of the input bdm instance with new element bdm$optk (a matrix).

Examples


# --- load mapped dataset
bdm.example()
# --- compute optimal number of clusters and attach the computation
bdm.optk.s2nr(exMap, plot.optk = TRUE, ret.optk = FALSE)

[Package bigMap version 2.3.1 Index]