independence_test {TDApplied}R Documentation

Independence test for two groups of persistence diagrams.

Description

Carries out inference to determine if two groups of persistence diagrams are independent or not based on kernel calculations (see (https://proceedings.neurips.cc/paper/2007/file/d5cfead94f5350c12c322b5b664544c1-Paper.pdf) for details). A small p-value in a certain dimension suggests that the groups are not independent in that dimension.

Usage

independence_test(
  g1,
  g2,
  dims = c(0, 1),
  sigma = 1,
  rho = NULL,
  t = 1,
  num_workers = parallelly::availableCores(omit = 1),
  verbose = FALSE,
  Ks = NULL,
  Ls = NULL
)

Arguments

g1

the first group of persistence diagrams, where each diagram was either the output from a persistent homology calculation like ripsDiag/calculate_homology/PyH, or diagram_to_df.

g2

the second group of persistence diagrams, where each diagram was either the output from a persistent homology calculation like ripsDiag/calculate_homology/PyH, or diagram_to_df.

dims

a non-negative integer vector of the homological dimensions in which the test is to be carried out, default c(0,1).

sigma

a positive number representing the bandwidth for the Fisher information metric, default 1.

rho

an optional positive number representing the heuristic for Fisher information metric approximation, see diagram_distance. Default NULL. If supplied, calculation of Gram matrices is sequential.

t

a positive number representing the scale for the persistence Fisher kernel, default 1.

num_workers

the number of cores used for parallel computation, default is one less than the number of cores on the machine.

verbose

a boolean flag for if the time duration of the function call should be printed, default FALSE

Ks

an optional list of precomputed Gram matrices for the first group of diagrams, with one element for each dimension. If not NULL and 'Ls' is not NULL then 'g1' and 'g2' do not need to be supplied.

Ls

an optional list of precomputed Gram matrices for the second group of diagrams, with one element for each dimension. If not NULL and 'Ks' is not NULL then 'g1' and 'g2' do not need to be supplied.

Details

The test is carried out with a parametric null distribution, making it much faster than non-parametric approaches. If all of the diagrams in either g1 or g2 are the same in some dimension, then some p-values may be NaN.

Value

a list with the following elements:

dimensions

the input 'dims' argument.

test_statisics

a numeric vector of the test statistic value in each dimension.

p_values

a numeric vector of the p-values in each dimension.

run_time

the run time of the function call, containing time units.

Author(s)

Shael Brown - shaelebrown@gmail.com

References

Gretton A et al. (2007). "A Kernel Statistical Test of Independence." https://proceedings.neurips.cc/paper/2007/file/d5cfead94f5350c12c322b5b664544c1-Paper.pdf.

See Also

permutation_test for an inferential group difference test for groups of persistence diagrams.

Examples


if(require("TDAstats"))
{
  # create two independent groups of diagrams of length 6, which
  # is the minimum length
  D1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,10),],
                                     dim = 0,threshold = 2)
  D2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,10),],
                                     dim = 0,threshold = 2)
  g1 <- list(D1,D2,D2,D2,D2,D2)
  g2 <- list(D2,D1,D1,D1,D1,D1)

  # do independence test with sigma = t = 1 in dimension 0, using
  # precomputed Gram matrices
  K = gram_matrix(diagrams = g1,dim = 0,t = 1,sigma = 1,num_workers = 2)
  L = gram_matrix(diagrams = g2,dim = 0,t = 1,sigma = 1,num_workers = 2)
  indep_test <- independence_test(Ks = list(K),Ls = list(L),dims = c(0))
  
}

[Package TDApplied version 3.0.3 Index]