R: t-Stochastic Neighbor Embedding to Clusters

tsne2clus {MOSS}

R Documentation

t-Stochastic Neighbor Embedding to Clusters

Description

Finds clusters on a 2 dimensional map using Density-based spatial clustering of applications with noise (DBSCAN; Esther et al. 1996).

Usage

tsne2clus(
  S.tsne,
  ann = NULL,
  labels,
  aest = NULL,
  eps_res = 100,
  eps_range = c(0, 4),
  min.clus.size = 10,
  group.names = "Groups",
  xlab = "x: tSNE(X)",
  ylab = "y: tSNE(X)",
  clus = TRUE
)

Arguments

`S.tsne`	Outcome of function "pca2tsne"
`ann`	Subjects' annotation data. An incidence matrix assigning subjects to classes of biological relevance. Meant to tune cluster assignation via Biological Homogeneity Index (BHI). If ann=NULL, the number of clusters is tuned with the Silhouette index instead of BHI. Defaults to NULL.
`labels`	Character vector with labels describing subjects. Meant to assign aesthetics to the visual display of clusters.
`aest`	Data frame containing points shape and color. Defaults to NULL.
`eps_res`	How many eps values should be explored between the specified range?
`eps_range`	Vector containing the minimum and maximum eps values to be explored. Defaults to c(0, 4).
`min.clus.size`	Minimum size for a cluster to appear in the visual display. Defaults to 10
`group.names`	The title for the legend's key if 'aest' is specified.
`xlab`	Name of the 'xlab'. Defaults to "x: tSNE(X)"
`ylab`	Name of the 'ylab'. Defaults to "y: tSNE(X)"
`clus`	Should we do clustering? Defaults to TRUE. If false, only point aesthetics are applied.

Details

The function takes the outcome of pca2tsne (or a list containing any two-columns matrix) and finds clusters via DBSCAN. It extends code from the MEREDITH (Taskesen et al. 2016) and clValid (Datta & Datta, 2018) R packages to tune DBSCAN parameters with Silhouette or Biological Homogeneity indexes.

Value

A list with the results of the DBSCAN clustering and (if argument 'plot'=TRUE) the corresponding graphical displays.

dbscan.res: a list with the results of the (sparse) SVD, containing:
- cluster: Cluster partition.
- eps: Optimal eps according to the Silhouette or Biological Homogeneity indexes criteria.
- SIL: Maximum peak in the trajectory of the Silhouette index.
- BHI: Maximum peak in the trajectory of the Biological Homogeneity index.
clusters.plot: A ggplot object with the clusters' graphical display.

References

Ester, Martin, Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," 226_231.
Hahsler, Michael, and Matthew Piekenbrock. 2017. "Dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms." https://cran.r-project.org/package=dbscan.
Datta, Susmita, and Somnath Datta. 2006. Methods for Evaluating Clustering Algorithms for Gene Expression Data Using a Reference Set of Functional Classes. BMC Bioinformatics 7 (1). BioMed Central:397.
Taskesen, Erdogan, Sjoerd M. H. Huisman, Ahmed Mahfouz, Jesse H. Krijthe, Jeroen de Ridder, Anja van de Stolpe, Erik van den Akker, Wim Verheagh, and Marcel J. T. Reinders. 2016. Pan-Cancer Subtyping in a 2D-Map Shows Substructures That Are Driven by Specific Combinations of Molecular Characteristics. Scientific Reports 6 (1):24949.

Examples


library(MOSS)
library(viridis)
library(cluster)
library(annotate)

# Using the 'iris' data tow show cluster definition via BHI criterion.
set.seed(42)
data(iris)
# Scaling columns.
X <- scale(iris[, -5])
# Calling pca2tsne to map the three variables onto a 2-D map.
Z <- pca2tsne(X, perp = 30, n.samples = 1, n.iter = 1000)
# Using 'species' as previous knoledge to identify clusters.
ann <- model.matrix(~ -1 + iris[, 5])
# Getting clusters.
tsne2clus(Z,
  ann = ann,
  labels = iris[, 5],
  aest = aest.f(iris[, 5]),
  group.names = "Species",
  eps_range = c(0, 3)
)

# Example of usage within moss.
set.seed(43)
sim_blocks <- simulate_data()$sim_blocks
out <- moss(sim_blocks[-4],
  tSNE = TRUE,
  cluster = list(eps_range = c(0, 4), eps_res = 100, min_clus_size = 1),
  plot = TRUE
)
out$clus_plot
out$clusters_vs_PCs

[Package MOSS version 0.2.2 Index]