diagram_kkmeans {TDApplied}R Documentation

Cluster a group of persistence diagrams using kernel k-means.

Description

Finds latent cluster labels for a group of persistence diagrams, using a kernelized version of the popular k-means algorithm. An optimal number of clusters may be determined by analyzing the withinss field of the clustering object over several values of k.

Usage

diagram_kkmeans(
  diagrams,
  K = NULL,
  centers,
  dim = 0,
  t = 1,
  sigma = 1,
  rho = NULL,
  num_workers = parallelly::availableCores(omit = 1),
  ...
)

Arguments

diagrams

a list of n>=2 persistence diagrams which are either the output of a persistent homology calculation like ripsDiag/calculate_homology/PyH, or the diagram_to_df function.

K

an optional precomputed Gram matrix of persistence diagrams, default NULL.

centers

number of clusters to initialize, no more than the number of diagrams although smaller values are recommended.

dim

the non-negative integer homological dimension in which the distance is to be computed, default 0.

t

a positive number representing the scale for the persistence Fisher kernel, default 1.

sigma

a positive number representing the bandwidth for the Fisher information metric, default 1.

rho

an optional positive number representing the heuristic for Fisher information metric approximation, see diagram_distance. Default NULL. If supplied, Gram matrix calculation is sequential.

num_workers

the number of cores used for parallel computation, default is one less than the number of cores on the machine.

...

additional parameters for the kkmeans kernlab function.

Details

Returns the output of kkmeans on the desired Gram matrix of a group of persistence diagrams in a particular dimension. The additional list elements stored in the output are needed to estimate cluster labels for new persistence diagrams in the 'predict_diagram_kkmeans' function.

Value

a list of class 'diagram_kkmeans' containing the output of kkmeans on the Gram matrix, i.e. a list containing the elements

clustering

an S4 object of class specc, the output of a kkmeans function call. The '.Data' slot of this object contains cluster memberships, 'withinss' contains the within-cluster sum of squares for each cluster, etc.

diagrams

the input 'diagrams' argument.

dim

the input 'dim' argument.

t

the input 't' argument.

sigma

the input 'sigma' argument.

Author(s)

Shael Brown - shaelebrown@gmail.com

References

Dhillon, I and Guan, Y and Kulis, B (2004). "A Unified View of Kernel k-means , Spectral Clustering and Graph Cuts." https://people.bu.edu/bkulis/pubs/spectral_techreport.pdf.

See Also

predict_diagram_kkmeans for predicting cluster labels of new diagrams.

Examples


if(require("TDAstats"))
{
  # create two diagrams
  D1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
                      dim = 1,threshold = 2)
  D2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
                      dim = 1,threshold = 2)
  g <- list(D1,D1,D2,D2)

  # calculate kmeans clusters with centers = 2, and sigma = t = 2 in dimension 0
  clust <- diagram_kkmeans(diagrams = g,centers = 2,dim = 0,t = 2,sigma = 2,num_workers = 2)
  
  # repeat with precomputed Gram matrix, gives the same result just much faster
  K <- gram_matrix(diagrams = g,num_workers = 2,t = 2,sigma = 2)
  cluster <- diagram_kkmeans(diagrams = g,K = K,centers = 2,dim = 0,sigma = 2,t = 2)
  
}

[Package TDApplied version 3.0.3 Index]