tsne2clus {MOSS}R Documentation

t-Stochastic Neighbor Embedding to Clusters

Description

Finds clusters on a 2 dimensional map using Density-based spatial clustering of applications with noise (DBSCAN; Esther et al. 1996).

Usage

tsne2clus(
  S.tsne,
  ann = NULL,
  labels,
  aest = NULL,
  eps_res = 100,
  eps_range = c(0, 4),
  min.clus.size = 10,
  group.names = "Groups",
  xlab = "x: tSNE(X)",
  ylab = "y: tSNE(X)",
  clus = TRUE
)

Arguments

S.tsne

Outcome of function "pca2tsne"

ann

Subjects' annotation data. An incidence matrix assigning subjects to classes of biological relevance. Meant to tune cluster assignation via Biological Homogeneity Index (BHI). If ann=NULL, the number of clusters is tuned with the Silhouette index instead of BHI. Defaults to NULL.

labels

Character vector with labels describing subjects. Meant to assign aesthetics to the visual display of clusters.

aest

Data frame containing points shape and color. Defaults to NULL.

eps_res

How many eps values should be explored between the specified range?

eps_range

Vector containing the minimum and maximum eps values to be explored. Defaults to c(0, 4).

min.clus.size

Minimum size for a cluster to appear in the visual display. Defaults to 10

group.names

The title for the legend's key if 'aest' is specified.

xlab

Name of the 'xlab'. Defaults to "x: tSNE(X)"

ylab

Name of the 'ylab'. Defaults to "y: tSNE(X)"

clus

Should we do clustering? Defaults to TRUE. If false, only point aesthetics are applied.

Details

The function takes the outcome of pca2tsne (or a list containing any two-columns matrix) and finds clusters via DBSCAN. It extends code from the MEREDITH (Taskesen et al. 2016) and clValid (Datta & Datta, 2018) R packages to tune DBSCAN parameters with Silhouette or Biological Homogeneity indexes.

Value

A list with the results of the DBSCAN clustering and (if argument 'plot'=TRUE) the corresponding graphical displays.

References

Examples


library(MOSS)
library(viridis)
library(cluster)
library(annotate)

# Using the 'iris' data tow show cluster definition via BHI criterion.
set.seed(42)
data(iris)
# Scaling columns.
X <- scale(iris[, -5])
# Calling pca2tsne to map the three variables onto a 2-D map.
Z <- pca2tsne(X, perp = 30, n.samples = 1, n.iter = 1000)
# Using 'species' as previous knoledge to identify clusters.
ann <- model.matrix(~ -1 + iris[, 5])
# Getting clusters.
tsne2clus(Z,
  ann = ann,
  labels = iris[, 5],
  aest = aest.f(iris[, 5]),
  group.names = "Species",
  eps_range = c(0, 3)
)

# Example of usage within moss.
set.seed(43)
sim_blocks <- simulate_data()$sim_blocks
out <- moss(sim_blocks[-4],
  tSNE = TRUE,
  cluster = list(eps_range = c(0, 4), eps_res = 100, min_clus_size = 1),
  plot = TRUE
)
out$clus_plot
out$clusters_vs_PCs


[Package MOSS version 0.2.2 Index]