distance_matrix {TDApplied}R Documentation

Compute a distance matrix from a list of persistence diagrams.

Description

Calculate the distance matrix d for either a single list of persistence diagrams (D_1,D_2,\dots,D_n), i.e. d[i,j] = d(D_i,D_j), or between two lists, (D_1,D_2,\dots,D_n) and (D'_1,D'_2,\dots,D'_n), d[i,j] = d(D_i,D'_j), in parallel.

Usage

distance_matrix(
  diagrams,
  other_diagrams = NULL,
  dim = 0,
  distance = "wasserstein",
  p = 2,
  sigma = NULL,
  rho = NULL,
  num_workers = parallelly::availableCores(omit = 1)
)

Arguments

diagrams

a list of persistence diagrams, either the output of persistent homology calculations like ripsDiag/calculate_homology/PyH, or diagram_to_df.

other_diagrams

either NULL (default) or another list of persistence diagrams to compute a cross-distance matrix.

dim

the non-negative integer homological dimension in which the distance is to be computed, default 0.

distance

a character determining which metric to use, either "wasserstein" (default) or "fisher".

p

a number representing the wasserstein power parameter, at least 1 and default 2.

sigma

a positive number representing the bandwidth of the Fisher information metric, default NULL.

rho

an optional positive number representing the heuristic for Fisher information metric approximation, see diagram_distance. Default NULL. If not NULL then matrix is calculated sequentially, but functions in the "exec" directory of the package can be loaded to calculate distance matrices in parallel with approximation.

num_workers

the number of cores used for parallel computation, default is one less than the number of cores on the machine.

Details

Distance matrices of persistence diagrams are used in downstream analyses, like in the diagram_mds, permutation_test and diagram_ksvm functions. If 'distance' is "fisher" then 'sigma' must not be NULL. Since the matrix is computed sequentially when approximating the Fisher information metric this is only recommended when the persistence diagrams contain many points and when the number of available cores is small.

Value

the numeric distance matrix.

Author(s)

Shael Brown - shaelebrown@gmail.com

See Also

diagram_distance for individual distance calculations.

Examples


if(require("TDAstats"))
{
  # create two diagrams
  D1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,10),],
                                     dim = 0,threshold = 2)
  D2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,10),],
                                     dim = 0,threshold = 2)
  g <- list(D1,D2)

  # calculate their distance matrix in dimension 0 with the persistence Fisher metric
  # using 2 cores
  D <- distance_matrix(diagrams = g,dim = 0,distance = "fisher",sigma = 1,num_workers = 2)

  # calculate their distance matrix in dimension 0 with the 2-wasserstein metric 
  # using 2 cores
  D <- distance_matrix(diagrams = g,dim = 0,distance = "wasserstein",p = 2,num_workers = 2)

  # now do the cross distance matrix, which is the same as the previous
  D_cross <- distance_matrix(diagrams = g,other_diagrams = g,
                             dim = 0,distance = "wasserstein",
                             p = 2,num_workers = 2)
}

[Package TDApplied version 3.0.3 Index]