R: Spectrum: Fast Adaptive Spectral Clustering for Single and...

Spectrum {Spectrum}

R Documentation

Spectrum: Fast Adaptive Spectral Clustering for Single and Multi-view Data

Description

Spectrum is a self-tuning spectral clustering method for single or multi-view data. Spectrum uses a new type of adaptive density-aware kernel that strengthens connections between points that share common nearest neighbours in the graph. For integrating multi-view data and reducing noise a tensor product graph data integration and diffusion procedure is used. Spectrum analyses eigenvector variance or distribution to determine the number of clusters. Spectrum is well suited for a wide range of data, including both Gaussian and non-Gaussian structures.

Usage

Spectrum(data, method = 1, silent = FALSE, showres = TRUE,
  diffusion = TRUE, kerneltype = c("density", "stsc"), maxk = 10,
  NN = 3, NN2 = 7, showpca = FALSE, frac = 2, thresh = 7,
  fontsize = 18, dotsize = 3, tunekernel = FALSE,
  clusteralg = "GMM", FASP = FALSE, FASPk = NULL, fixk = NULL,
  krangemax = 10, runrange = FALSE, diffusion_iters = 4,
  KNNs_p = 10, missing = FALSE)

Arguments

`data`	Data frame or list of data frames: contains the data with points to cluster as columns and rows as features. For multi-view data a list of dataframes is to be supplied with the samples in the same order.
`method`	Numerical value: 1 = default eigengap method (Gaussian clusters), 2 = multimodality gap method (Gaussian/ non-Gaussian clusters), 3 = no automatic method (see fixk param)
`silent`	Logical flag: whether to turn off messages
`showres`	Logical flag: whether to show the results on the screen
`diffusion`	Logical flag: whether to perform graph diffusion to reduce noise (default=TRUE)
`kerneltype`	Character string: 'density' (default) = adaptive density aware kernel, 'stsc' = Zelnik-Manor self-tuning kernel
`maxk`	Numerical value: the maximum number of expected clusters (default=10). This is data dependent, do not set excessively high.
`NN`	Numerical value: kernel param, the number of nearest neighbours to use sigma parameters (default=3)
`NN2`	Numerical value: kernel param, the number of nearest neighbours to use for the common nearest neigbours (default = 7)
`showpca`	Logical flag: whether to show pca when running on one view
`frac`	Numerical value: optk search param, fraction to find the last substantial drop (multimodality gap method param)
`thresh`	Numerical value: optk search param, how many points ahead to keep searching (multimodality gap method param)
`fontsize`	Numerical value: controls font size of the ggplot2 plots
`dotsize`	Numerical value: controls the dot size of the ggplot2 plots
`tunekernel`	Logical flag: whether to tune the kernel, only applies for method 2 (default=FALSE)
`clusteralg`	Character string: clustering algorithm for eigenvector matrix (GMM or km)
`FASP`	Logical flag: whether to use Fast Approximate Spectral Clustering (for v. high sample numbers)
`FASPk`	Numerical value: the number of centroids to compute when doing FASP
`fixk`	Numerical value: if we are just performing spectral clustering without automatic selection of K, set this parameter and method to 3
`krangemax`	Numerical value: the maximum K value to iterate towards when running a range of K
`runrange`	Logical flag: whether to run a range of K or not (default=FALSE), puts Kth results into Kth element of list
`diffusion_iters`	Numerical value: number of diffusion iterations for the graph (default=4)
`KNNs_p`	Numerical value: number of KNNs when making KNN graph (default=10, suggested=10-20)
`missing`	Logical flag: whether to impute missing data in multi-view analysis (default=FALSE)

Value

A list, containing: 1) cluster assignments, in the same order as input data columns 2) eigenvector analysis results (either eigenvalues or dip test statistics) 3) optimal K 4) final similarity matrix 5) eigenvectors and eigenvalues of graph Laplacian

Examples

res <- Spectrum(brain[[1]][,1:50])

[Package Spectrum version 1.1 Index]