permutation_test {TDApplied}R Documentation

Permutation test for finding group differences between persistence diagrams.

Description

A non-parametric ANOVA-like test for persistence diagrams (see https://link.springer.com/article/10.1007/s41468-017-0008-7 for details). In each desired dimension a test statistic (loss) is calculated, then the group labels are shuffled for some number of iterations and the loss is recomputed each time thereby generating a null distribution for the test statistic. This test generates a p-value in each desired dimension.

Usage

permutation_test(
  ...,
  iterations = 20,
  p = 2,
  q = 2,
  dims = c(0, 1),
  dist_mats = NULL,
  group_sizes = NULL,
  paired = FALSE,
  distance = "wasserstein",
  sigma = NULL,
  rho = NULL,
  num_workers = parallelly::availableCores(omit = 1),
  verbose = FALSE
)

Arguments

...

lists of persistence diagrams which are either the output of persistent homology calculations like ripsDiag/calculate_homology/PyH, or diagram_to_df. Each list must contain at least 2 diagrams.

iterations

the number of iterations for permuting group labels, default 20.

p

a positive number representing the wasserstein power parameter, a number at least 1 (and Inf if using the bottleneck distance) and default 2.

q

a finite number at least 1 for exponentiation in the Turner loss function, default 2.

dims

a non-negative integer vector of the homological dimensions in which the test is to be carried out, default c(0,1).

dist_mats

an optional list of precomputed distances matrices, one for each dimension, where the rows and columns would correspond to the unlisted groups of diagrams (in order), default NULL. If not NULL then no lists of diagrams need to be supplied.

group_sizes

a vector of group sizes, one for each group, when 'dist_mats' is not NULL.

paired

a boolean flag for if there is a second-order pairing between diagrams at the same index in different groups, default FALSE

distance

a string which determines which type of distance calculation to carry out, either "wasserstein" (default) or "fisher".

sigma

the positive bandwidth for the Fisher information metric, default NULL.

rho

an optional positive number representing the heuristic for Fisher information metric approximation, see diagram_distance. Default NULL. If supplied, code execution is sequential.

num_workers

the number of cores used for parallel computation, default is one less than the number of cores on the machine.

verbose

a boolean flag for if the time duration of the function call should be printed, default FALSE

Details

The test is carried out in parallel and optimized in order to not recompute already-calculated distances. As such, memory issues may occur when the number of persistence diagrams is very large. Like in (https://github.com/hassan-abdallah/Statistical_Inference_PH_fMRI/blob/main/Abdallah_et_al_Statistical_Inference_PH_fMRI.pdf) an option is provided for pairing diagrams between groups to reduce variance (in order to boost statistical power), and like it was suggested in the original paper functionality is provided for an arbitrary number of groups (not just 2). A small p-value in a dimension suggests that the groups are different (separated) in that dimension. If 'distance' is "fisher" then 'sigma' must not be NULL. TDAstats also has a 'permutation_test' function so care should be taken to use the desired function when using TDApplied with TDAstats. If 'dist_mats' is supplied then the sum of the elements of 'group_sizes' must equal the number of rows and columns of each of its elements.

Value

a list with the following elements:

dimensions

the input 'dims' argument.

permvals

a numeric vector of length 'iterations' with the permuted loss value for each iteration (permutation)

test_statisics

a numeric vector of the test statistic value in each dimension.

p_values

a numeric vector of the p-values in each dimension.

run_time

the run time of the function call, containing time units.

Author(s)

Shael Brown - shaelebrown@gmail.com

References

Robinson T, Turner K (2017). "Hypothesis testing for topological data analysis." https://link.springer.com/article/10.1007/s41468-017-0008-7.

Abdallah H et al. (2021). "Statistical Inference for Persistent Homology applied to fMRI." https://github.com/hassan-abdallah/Statistical_Inference_PH_fMRI/blob/main/Abdallah_et_al_Statistical_Inference_PH_fMRI.pdf.

See Also

independence_test for an inferential test of independence for two groups of persistence diagrams.

Examples


if(require("TDAstats"))
{
  # create two groups of diagrams
  D1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,10),],
                                     dim = 0,threshold = 2)
  D2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,10),],
                                     dim = 0,threshold = 2)
  g1 <- list(D1,D2)
  g2 <- list(D1,D2)

  # run test in dimension 0 with 1 iteration, note that the TDA package function
  # "permutation_test" can mask TDApplied's function, so we will specify explicitly
  # which function we are using
  perm_test <- TDApplied::permutation_test(g1,g2,iterations = 1,
                                           num_workers = 2,
                                           dims = c(0))
                                 
  # repeat with precomputed distance matrix, gives similar results
  # (same but the randomness of the permutations can give small differences)
  # just much faster
  D <- distance_matrix(diagrams = list(D1,D2,D1,D2),dim = 0,
                       num_workers = 2)
  perm_test <- TDApplied::permutation_test(dist_mats = list(D),group_sizes = c(2,2),
                                           dims = c(0))
}

[Package TDApplied version 3.0.3 Index]