predict_diagram_kkmeans {TDApplied}R Documentation

Predict the cluster labels for new persistence diagrams using a pre-computed clustering.

Description

Returns the nearest (highest kernel value) kkmeans cluster center label for new persistence diagrams. This allows for reusing old cluster models for new tasks, or to perform cross validation.

Usage

predict_diagram_kkmeans(
  new_diagrams,
  K = NULL,
  clustering,
  num_workers = parallelly::availableCores(omit = 1)
)

Arguments

new_diagrams

a list of persistence diagrams which are either the output of a persistent homology calculation like ripsDiag/calculate_homology/PyH, or diagram_to_df. Only one of 'new_diagrams' and 'K' need to be supplied.

K

an optional precomputed cross Gram matrix of the new diagrams and the diagrams used in 'clustering', default NULL. If not NULL then 'new_diagrams' does not need to be supplied.

clustering

the output of a diagram_kkmeans function call, of class 'diagram_kkmeans'.

num_workers

the number of cores used for parallel computation, default is one less than the number of cores on the machine.

Value

a vector of the predicted cluster labels for the new diagrams.

Author(s)

Shael Brown - shaelebrown@gmail.com

See Also

diagram_kkmeans for clustering persistence diagrams.

Examples


if(require("TDAstats"))
{
  # create two diagrams
  D1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
                      dim = 1,threshold = 2)
  D2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
                      dim = 1,threshold = 2)
  g <- list(D1,D1,D2,D2)

  # calculate kmeans clusters with centers = 2, and sigma = t = 2 in dimension 0
  clust <- diagram_kkmeans(diagrams = g,centers = 2,dim = 0,t = 2,sigma = 2,num_workers = 2)

  # create two new diagrams
  D3 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
                      dim = 1,threshold = 2)
  D4 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,20),],
                      dim = 1,threshold = 2)
  g_new <- list(D3,D4)

  # predict cluster labels
  predict_diagram_kkmeans(new_diagrams = g_new,clustering = clust,num_workers = 2)
  
  # predict cluster labels with precomputed Gram matrix, gives same result but
  # much faster
  K <- gram_matrix(diagrams = g_new,other_diagrams = clust$diagrams,
                   dim = clust$dim,t = clust$t,sigma = clust$sigma,
                   num_workers = 2)
  predict_diagram_kkmeans(K = K,clustering = clust)
  
}

[Package TDApplied version 3.0.3 Index]