unif_test {sphunif}R Documentation

Circular and (hyper)spherical uniformity tests

Description

Implementation of several uniformity tests on the (hyper)sphere S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}, p\ge 2, with calibration either in terms of their asymptotic/exact distributions, if available, or Monte Carlo.

unif_test receives a sample of directions {\bf X}_1,\ldots,{\bf X}_n\in S^{p-1} in Cartesian coordinates, except for the circular case (p=2) in which the sample can be represented in terms of angles \Theta_1,\ldots,\Theta_n\in [0, 2\pi).

unif_test allows to perform several tests within a single call, facilitating thus the exploration of a dataset by applying several tests.

Usage

unif_test(data, type = "all", p_value = "asymp", alpha = c(0.1, 0.05,
  0.01), M = 10000, stats_MC = NULL, crit_val = NULL,
  data_sorted = FALSE, K_max = 10000, method = "I", CCF09_dirs = NULL,
  CJ12_beta = 0, CJ12_reg = 3, cov_a = 2 * pi, Cressie_t = 1/3,
  K_CCF09 = 25, Poisson_rho = 0.5, Pycke_q = 0.5, Rayleigh_m = 1,
  Riesz_s = 1, Rothman_t = 1/3, Sobolev_vk2 = c(0, 0, 1),
  Softmax_kappa = 1, Stereo_a = 0, ...)

Arguments

data

sample to perform the test. A matrix of size c(n, p) containing a sample of size n of directions (in Cartesian coordinates) on S^{p-1}. Alternatively if p = 2, a matrix of size c(n, 1) containing the n angles on [0, 2\pi) of the circular sample on S^{1}. Other objects accepted are an array of size c(n, p, 1) with directions (in Cartesian coordinates), or a vector of size n or an array of size c(n, 1, 1) with angular data. Must not contain NA's.

type

type of test to be applied. A character vector containing any of the following types of tests, depending on the dimension p:

If type = "all" (default), then type is set as avail_cir_tests or avail_sph_tests, depending on the value of p.

p_value

type of p-value computation. Either "MC" for employing the approximation by Monte Carlo of the exact null distribution, "asymp" (default) for the use of the asymptotic/exact null distribution (if available), or "crit_val" for approximation by means of the table of critical values crit_val.

alpha

vector with significance levels. Defaults to c(0.10, 0.05, 0.01).

M

number of Monte Carlo replications for approximating the null distribution when approx = "MC". Also, number of Monte Carlo samples for approximating the asymptotic distributions based on weighted sums of chi squared random variables. Defaults to 1e4.

stats_MC

a data frame of size c(M, length(type)), with column names containing the character vector type, that results from extracting $stats_MC from a call to unif_stat_MC. If provided, the computation of Monte Carlo statistics when approx = "MC" is skipped. stats_MC is checked internally to see if it is sorted. Internally computed if NULL (default).

crit_val

table with critical values for the tests, to be used if p_value = "crit_val". A data frame, with column names containing the character vector type and rows corresponding to the significance levels alpha, that results from extracting $crit_val_MC from a call to unif_stat_MC. Internally computed if NULL (default).

data_sorted

is the circular data sorted? If TRUE, certain statistics are faster to compute. Defaults to FALSE.

K_max

integer giving the truncation of the series that compute the asymptotic p-value of a Sobolev test. Defaults to 1e4.

method

method for approximating the density, distribution, or quantile function of the weighted sum of chi squared random variables. Must be "I" (Imhof), "SW" (Satterthwaite–Welch), "HBE" (Hall–Buckley–Eagleson), or "MC" (Monte Carlo; only for distribution or quantile functions). Defaults to "I".

CCF09_dirs

a matrix of size c(n_proj, p) containing n_proj random directions (in Cartesian coordinates) on S^{p-1} to perform the CCF09 test. If NULL (default), a sample of size n_proj = 50 directions is computed internally.

CJ12_beta

\beta parameter in the exponential regime of CJ12 test, a positive real.

CJ12_reg

type of asymptotic regime for CJ12 test, either 1 (sub-exponential regime), 2 (exponential), or 3 (super-exponential; default).

cov_a

a_n = a / n parameter used in the length of the arcs of the coverage-based tests. Must be positive. Defaults to 2 * pi.

Cressie_t

t parameter for the Cressie test, a real in (0, 1). Defaults to 1 / 3.

K_CCF09

integer giving the truncation of the series present in the asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to 25.

Poisson_rho

\rho parameter for the Poisson test, a real in [0, 1). Defaults to 0.5.

Pycke_q

q parameter for the Pycke "q-test", a real in (0, 1). Defaults to 1 / 2.

Rayleigh_m

integer m for the m-modal Rayleigh test. Defaults to m = 1 (the standard Rayleigh test).

Riesz_s

s parameter for the s-Riesz test, a real in (0, 2). Defaults to 1.

Rothman_t

t parameter for the Rothman test, a real in (0, 1). Defaults to 1 / 3.

Sobolev_vk2

weights for the finite Sobolev test. A non-negative vector or matrix. Defaults to c(0, 0, 1).

Softmax_kappa

\kappa parameter for the Softmax test, a non-negative real. Defaults to 1.

Stereo_a

a parameter for the Stereo test, a real in [-1, 1]. Defaults to 0.

...

If p_value = "MC" or p_value = "crit_val", optional performance parameters to be passed to unif_stat_MC: chunks, cores, and seed. If p_value = "MC", additional parameters to unif_stat_distr.

Details

All the tests reject for large values of the test statistic, so the critical values for the significance levels alpha correspond to the alpha-upper quantiles of the null distribution of the test statistic.

When p_value = "asymp", tests that do not have an implemented or known asymptotic are omitted, and a warning is generated.

When p_value = "MC", it is possible to have a progress bar indicating the Monte Carlo simulation progress if unif_test is wrapped with progressr::with_progress or if progressr::handlers(global = TRUE) is invoked (once) by the user. See the examples below. The progress bar is updated with the number of finished chunks.

All the statistics are continuous random variables except the Hodges–Ajne statistic ("Hodges_Ajne"), the Cressie statistic ("Cressie"), and the number of (different) uncovered spacings ("Num_uncover"). These three statistics are discrete random variables.

The Monte Carlo calibration for the CCF09 test is made conditionally on the choice of
CCF09_dirs. That is, all the Monte Carlo statistics share the same random directions.

Except for CCF09_dirs, K_CCF09, and CJ12_reg, all the test-specific parameters are vectorized.

Descriptions and references for most of the tests are available in García-Portugués and Verdebout (2018).

Value

If only a single test is performed, a list with class htest containing the following components:

If several tests are performed, a type-named list with entries for each test given by the above list.

References

García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. doi:10.48550/arXiv.1804.00286.

Examples

## Asymptotic distribution

# Circular data
n <- 10
samp_cir <- r_unif_cir(n = n)

# Matrix
unif_test(data = samp_cir, type = "Ajne", p_value = "asymp")

# Vector
unif_test(data = samp_cir[, 1], type = "Ajne", p_value = "asymp")

# Array
unif_test(data = array(samp_cir, dim = c(n, 1, 1)), type = "Ajne",
          p_value = "asymp")

# Several tests
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "asymp")

# Spherical data
n <- 10
samp_sph <- r_unif_sph(n = n, p = 3)

# Array
unif_test(data = samp_sph, type = "Bingham", p_value = "asymp")

# Matrix
unif_test(data = samp_sph[, , 1], type = "Bingham", p_value = "asymp")

# Several tests
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "asymp")

## Monte Carlo

# Circular data
unif_test(data = samp_cir, type = "Ajne", p_value = "MC")
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "MC")

# Spherical data
unif_test(data = samp_sph, type = "Bingham", p_value = "MC")
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC")

# Caching stats_MC
stats_MC_cir <- unif_stat_MC(n = nrow(samp_cir), p = 2)$stats_MC
stats_MC_sph <- unif_stat_MC(n = nrow(samp_sph), p = 3)$stats_MC
unif_test(data = samp_cir, type = avail_cir_tests,
          p_value = "MC", stats_MC = stats_MC_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
          stats_MC = stats_MC_sph)

## Critical values

# Circular data
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "crit_val")

# Spherical data
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val")

# Caching crit_val
crit_val_cir <- unif_stat_MC(n = n, p = 2)$crit_val_MC
crit_val_sph <- unif_stat_MC(n = n, p = 3)$crit_val_MC
unif_test(data = samp_cir, type = avail_cir_tests,
          p_value = "crit_val", crit_val = crit_val_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val",
          crit_val = crit_val_sph)

## Specific arguments

# Rothman
unif_test(data = samp_cir, type = "Rothman", Rothman_t = 0.5)

# CCF09
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
          CCF09_dirs = samp_sph[1:2, , 1])
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
          CCF09_dirs = samp_sph[3:4, , 1])

## Using a progress bar when p_value = "MC"

# Define a progress bar
require(progress)
require(progressr)
handlers(handler_progress(
  format = paste("(:spin) [:bar] :percent Iter: :current/:total Rate:",
                 ":tick_rate iter/sec ETA: :eta Elapsed: :elapsedfull"),
  clear = FALSE))

# Call unif_test() within with_progress()
with_progress(
  unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
            chunks = 10, M = 1e3)
)

# With several cores
with_progress(
  unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
            cores = 2, chunks = 10, M = 1e3)
)

# Instead of using with_progress() each time, it is more practical to run
# handlers(global = TRUE)
# once to activate progress bars in your R session


[Package sphunif version 1.4.0 Index]