| diagram_kkmeans {TDApplied} | R Documentation | 
Cluster a group of persistence diagrams using kernel k-means.
Description
Finds latent cluster labels for a group of persistence diagrams, using a kernelized version of the popular k-means algorithm. An optimal number of clusters may be determined by analyzing the withinss field of the clustering object over several values of k.
Usage
diagram_kkmeans(
  diagrams,
  K = NULL,
  centers,
  dim = 0,
  t = 1,
  sigma = 1,
  rho = NULL,
  num_workers = parallelly::availableCores(omit = 1),
  ...
)
Arguments
| diagrams | a list of n>=2 persistence diagrams which are either the output of a persistent homology calculation like ripsDiag/ | 
| K | an optional precomputed Gram matrix of persistence diagrams, default NULL. | 
| centers | number of clusters to initialize, no more than the number of diagrams although smaller values are recommended. | 
| dim | the non-negative integer homological dimension in which the distance is to be computed, default 0. | 
| t | a positive number representing the scale for the persistence Fisher kernel, default 1. | 
| sigma | a positive number representing the bandwidth for the Fisher information metric, default 1. | 
| rho | an optional positive number representing the heuristic for Fisher information metric approximation, see  | 
| num_workers | the number of cores used for parallel computation, default is one less than the number of cores on the machine. | 
| ... | additional parameters for the  | 
Details
Returns the output of kkmeans on the desired Gram matrix of a group of persistence diagrams
in a particular dimension. The additional list elements stored in the output are needed
to estimate cluster labels for new persistence diagrams in the 'predict_diagram_kkmeans'
function.
Value
a list of class 'diagram_kkmeans' containing the output of kkmeans on the Gram matrix, i.e. a list containing the elements
- clustering
- an S4 object of class specc, the output of a - kkmeansfunction call. The '.Data' slot of this object contains cluster memberships, 'withinss' contains the within-cluster sum of squares for each cluster, etc.
- diagrams
- the input 'diagrams' argument. 
- dim
- the input 'dim' argument. 
- t
- the input 't' argument. 
- sigma
- the input 'sigma' argument. 
Author(s)
Shael Brown - shaelebrown@gmail.com
References
Dhillon, I and Guan, Y and Kulis, B (2004). "A Unified View of Kernel k-means , Spectral Clustering and Graph Cuts." https://people.bu.edu/bkulis/pubs/spectral_techreport.pdf.
See Also
predict_diagram_kkmeans for predicting cluster labels of new diagrams.
Examples
if(require("TDAstats"))
{
  # create two diagrams
  D1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
                      dim = 1,threshold = 2)
  D2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
                      dim = 1,threshold = 2)
  g <- list(D1,D1,D2,D2)
  # calculate kmeans clusters with centers = 2, and sigma = t = 2 in dimension 0
  clust <- diagram_kkmeans(diagrams = g,centers = 2,dim = 0,t = 2,sigma = 2,num_workers = 2)
  
  # repeat with precomputed Gram matrix, gives the same result just much faster
  K <- gram_matrix(diagrams = g,num_workers = 2,t = 2,sigma = 2)
  cluster <- diagram_kkmeans(diagrams = g,K = K,centers = 2,dim = 0,sigma = 2,t = 2)
  
}