Spectrum {Spectrum} | R Documentation |
Spectrum: Fast Adaptive Spectral Clustering for Single and Multi-view Data
Description
Spectrum is a self-tuning spectral clustering method for single or multi-view data. Spectrum uses a new type of adaptive density-aware kernel that strengthens connections between points that share common nearest neighbours in the graph. For integrating multi-view data and reducing noise a tensor product graph data integration and diffusion procedure is used. Spectrum analyses eigenvector variance or distribution to determine the number of clusters. Spectrum is well suited for a wide range of data, including both Gaussian and non-Gaussian structures.
Usage
Spectrum(data, method = 1, silent = FALSE, showres = TRUE,
diffusion = TRUE, kerneltype = c("density", "stsc"), maxk = 10,
NN = 3, NN2 = 7, showpca = FALSE, frac = 2, thresh = 7,
fontsize = 18, dotsize = 3, tunekernel = FALSE,
clusteralg = "GMM", FASP = FALSE, FASPk = NULL, fixk = NULL,
krangemax = 10, runrange = FALSE, diffusion_iters = 4,
KNNs_p = 10, missing = FALSE)
Arguments
data |
Data frame or list of data frames: contains the data with points to cluster as columns and rows as features. For multi-view data a list of dataframes is to be supplied with the samples in the same order. |
method |
Numerical value: 1 = default eigengap method (Gaussian clusters), 2 = multimodality gap method (Gaussian/ non-Gaussian clusters), 3 = no automatic method (see fixk param) |
silent |
Logical flag: whether to turn off messages |
showres |
Logical flag: whether to show the results on the screen |
diffusion |
Logical flag: whether to perform graph diffusion to reduce noise (default=TRUE) |
kerneltype |
Character string: 'density' (default) = adaptive density aware kernel, 'stsc' = Zelnik-Manor self-tuning kernel |
maxk |
Numerical value: the maximum number of expected clusters (default=10). This is data dependent, do not set excessively high. |
NN |
Numerical value: kernel param, the number of nearest neighbours to use sigma parameters (default=3) |
NN2 |
Numerical value: kernel param, the number of nearest neighbours to use for the common nearest neigbours (default = 7) |
showpca |
Logical flag: whether to show pca when running on one view |
frac |
Numerical value: optk search param, fraction to find the last substantial drop (multimodality gap method param) |
thresh |
Numerical value: optk search param, how many points ahead to keep searching (multimodality gap method param) |
fontsize |
Numerical value: controls font size of the ggplot2 plots |
dotsize |
Numerical value: controls the dot size of the ggplot2 plots |
tunekernel |
Logical flag: whether to tune the kernel, only applies for method 2 (default=FALSE) |
clusteralg |
Character string: clustering algorithm for eigenvector matrix (GMM or km) |
FASP |
Logical flag: whether to use Fast Approximate Spectral Clustering (for v. high sample numbers) |
FASPk |
Numerical value: the number of centroids to compute when doing FASP |
fixk |
Numerical value: if we are just performing spectral clustering without automatic selection of K, set this parameter and method to 3 |
krangemax |
Numerical value: the maximum K value to iterate towards when running a range of K |
runrange |
Logical flag: whether to run a range of K or not (default=FALSE), puts Kth results into Kth element of list |
diffusion_iters |
Numerical value: number of diffusion iterations for the graph (default=4) |
KNNs_p |
Numerical value: number of KNNs when making KNN graph (default=10, suggested=10-20) |
missing |
Logical flag: whether to impute missing data in multi-view analysis (default=FALSE) |
Value
A list, containing: 1) cluster assignments, in the same order as input data columns 2) eigenvector analysis results (either eigenvalues or dip test statistics) 3) optimal K 4) final similarity matrix 5) eigenvectors and eigenvalues of graph Laplacian
Examples
res <- Spectrum(brain[[1]][,1:50])