tSNE {ProjectionBasedClustering}R Documentation

T-distributed Stochastic Neighbor Embedding (t-SNE)

Description

T-distributed Stochastic Neighbor Embedding res = tSNE(Data, KNN=30,OutputDimension=2)

Usage

tSNE(DataOrDistances,k,OutputDimension=2,Algorithm='tsne_cpp',

method="euclidean",Whitening=FALSE, Iterations=1000,PlotIt=FALSE,Cls,num_threads=1,...)

Arguments

DataOrDistances

Numerical matrix defined as either

Data, i.e., [1:n,1:d], nonsymmetric, and consists of n cases of d-dimensional data points with every case having d attributes, variables or features,

or

Distances, i.e.,[1:n,1:n], symmetric and consists of n cases, e.g., as.matrix(dist(Data,method))

k

number of k nearest neighbors=number of effective nearest neighbors("perplexity"); Important parameter. If not given, settings of packages of t-SNE will be used depending Algorithm

OutputDimension

Number of dimensions in the Outputspace, default=2

Algorithm

'tsne_cpp': T-Distributed Stochastic Neighbor Embedding using a Barnes-HutImplementation in C++ of Rtsne. Requires Version >= 0.15 of Rtsne for multicore parallelisation.

'tsne_opt_cpp': T-Distributed Stochastic Neighbor Embedding with automated optimized parameters using a Barnes-HutImplementation in C++ of [Ulyanov, 2016].

'tsne_r': pure R implementation of the t-SNE algorithm of of tsne

method

method specified by distance string: 'euclidean','cityblock=manhatten','cosine','chebychev','jaccard','minkowski','manhattan','binary'

Whitening

A boolean value indicating whether the matrix data should be whitened (tsne_r) or if pca should be used priorly (tsne_cpp)

Iterations

maximum number of iterations to perform.

PlotIt

Default: FALSE, If TRUE: Plots the projection as a 2d visualization. OutputDimension>2: only the first two dimensions will be shown

Cls

[1:n,1] Optional,: only relevant if PlotIt=TRUE. Numeric vector, given Classification in numbers: every element is the cluster number of a certain corresponding element of data.

num_threads

Number of threads for parallel computation, only usable for Algorithm='tsne_cpp' or 'tsne_opt_cpp'

...

Further arguments passed on to either 'Rtsne' or 'tsne'

Details

An short overview of different types of projection methods can be found in [Thrun, 2018, p.42, Fig. 4.1], doi:10.1007/978-3-658-20540-9.

Value

List of

ProjectedPoints

[1:n,OutputDimension], n by OutputDimension matrix containing coordinates of the Projection

ModelObject

NULL for tsne_r, further information if tsne_cpp is selected

Note

A wrapper for Rtsne (Algorithm='tsne_cpp'),

Multicore-opt-tSNE (Algorithm='tsne_opt_cpp'),

or for tsne (Algorithm='tsne_r')

You can use the standard ShepardScatterPlot or the better approach through the ShepardDensityPlot of the CRAN package DataVisualizations.

Author(s)

Michael Thrun, Luca Brinkmann

References

Anna C. Belkina, Christopher O. Ciccolella, Rina Anno, Josef Spidlen, Richard Halpert, Jennifer Snyder-Cappione: Automated optimal parameters for T-distributed stochastic neighbor embedding improve visualization and allow analysis of large datasets, bioRxiv 451690, doi: https://doi.org/10.1101/451690, 2018.

L.J.P van der Maaten: Accelerating t-SNE using tree-based algorithms, Journal of Machine Learning Research 15.1:3221-3245, 2014.

Ulyanov, Dmitry: Multicore-TSNE, GitHub repository URL https://github.com/DmitryUlyanov/Multicore-TSNE, 2016.

Examples

data('Hepta')
Data=Hepta$Data

## Not run: 
Proj=tSNE(Data,k=7)

PlotProjectedPoints(Proj$ProjectedPoints,Hepta$Cls)

## End(Not run)


[Package ProjectionBasedClustering version 1.2.1 Index]