diagram_kkmeans {TDApplied} | R Documentation |
Cluster a group of persistence diagrams using kernel k-means.
Description
Finds latent cluster labels for a group of persistence diagrams, using a kernelized version of the popular k-means algorithm. An optimal number of clusters may be determined by analyzing the withinss field of the clustering object over several values of k.
Usage
diagram_kkmeans(
diagrams,
K = NULL,
centers,
dim = 0,
t = 1,
sigma = 1,
rho = NULL,
num_workers = parallelly::availableCores(omit = 1),
...
)
Arguments
diagrams |
a list of n>=2 persistence diagrams which are either the output of a persistent homology calculation like ripsDiag/ |
K |
an optional precomputed Gram matrix of persistence diagrams, default NULL. |
centers |
number of clusters to initialize, no more than the number of diagrams although smaller values are recommended. |
dim |
the non-negative integer homological dimension in which the distance is to be computed, default 0. |
t |
a positive number representing the scale for the persistence Fisher kernel, default 1. |
sigma |
a positive number representing the bandwidth for the Fisher information metric, default 1. |
rho |
an optional positive number representing the heuristic for Fisher information metric approximation, see |
num_workers |
the number of cores used for parallel computation, default is one less than the number of cores on the machine. |
... |
additional parameters for the |
Details
Returns the output of kkmeans
on the desired Gram matrix of a group of persistence diagrams
in a particular dimension. The additional list elements stored in the output are needed
to estimate cluster labels for new persistence diagrams in the 'predict_diagram_kkmeans'
function.
Value
a list of class 'diagram_kkmeans' containing the output of kkmeans
on the Gram matrix, i.e. a list containing the elements
- clustering
an S4 object of class specc, the output of a
kkmeans
function call. The '.Data' slot of this object contains cluster memberships, 'withinss' contains the within-cluster sum of squares for each cluster, etc.- diagrams
the input 'diagrams' argument.
- dim
the input 'dim' argument.
- t
the input 't' argument.
- sigma
the input 'sigma' argument.
Author(s)
Shael Brown - shaelebrown@gmail.com
References
Dhillon, I and Guan, Y and Kulis, B (2004). "A Unified View of Kernel k-means , Spectral Clustering and Graph Cuts." https://people.bu.edu/bkulis/pubs/spectral_techreport.pdf.
See Also
predict_diagram_kkmeans
for predicting cluster labels of new diagrams.
Examples
if(require("TDAstats"))
{
# create two diagrams
D1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
dim = 1,threshold = 2)
D2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
dim = 1,threshold = 2)
g <- list(D1,D1,D2,D2)
# calculate kmeans clusters with centers = 2, and sigma = t = 2 in dimension 0
clust <- diagram_kkmeans(diagrams = g,centers = 2,dim = 0,t = 2,sigma = 2,num_workers = 2)
# repeat with precomputed Gram matrix, gives the same result just much faster
K <- gram_matrix(diagrams = g,num_workers = 2,t = 2,sigma = 2)
cluster <- diagram_kkmeans(diagrams = g,K = K,centers = 2,dim = 0,sigma = 2,t = 2)
}