bd.test {Ball}R Documentation

Ball Divergence based Equality of Distributions Test

Description

Performs the nonparametric two-sample or K-sample Ball Divergence test for equality of multivariate distributions

Usage

bd.test(x, ...)

## Default S3 method:
bd.test(
  x,
  y = NULL,
  num.permutations = 99,
  method = c("permutation", "limit"),
  distance = FALSE,
  size = NULL,
  seed = 1,
  num.threads = 0,
  kbd.type = c("sum", "maxsum", "max"),
  weight = c("constant", "variance"),
  ...
)

## S3 method for class 'formula'
bd.test(formula, data, subset, na.action, ...)

Arguments

x

a numeric vector, matrix, data.frame, or a list containing at least two numeric vectors, matrices, or data.frames.

...

further arguments to be passed to or from methods.

y

a numeric vector, matrix, data.frame.

num.permutations

the number of permutation replications. When num.permutations = 0, the function just returns the Ball Divergence statistic. Default: num.permutations = 99.

method

if method = "permutation", a permutation procedure is carried out to compute the p-value; if method = "limit", an approximate null distribution is used when weight = "constant". Any unambiguous substring can be given. Default method = "permutation".

distance

if distance = TRUE, the elements of x will be considered as a distance matrix. Default: distance = FALSE.

size

a vector recording sample size of each group.

seed

the random seed. Default seed = 1.

num.threads

number of threads. If num.threads = 0, then all of available cores will be used. Default num.threads = 0.

kbd.type

a character string specifying the K-sample Ball Divergence test statistic, must be one of "sum", "summax", or "max". Any unambiguous substring can be given. Default kbd.type = "sum".

weight

a character string specifying the weight form of Ball Divergence statistic. It must be one of "constant" or "variance". Any unambiguous substring can be given. Default: weight = "constant".

formula

a formula of the form response ~ group where response gives the data values and group a vector or factor of the corresponding groups.

data

an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

subset

an optional vector specifying a subset of observations to be used.

na.action

a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").

Details

bd.test is nonparametric test for the two-sample or K-sample problem. It can detect distribution difference between K(K \geq 2) sample even though sample size are imbalanced. This test can cope well multivariate dataset or complex dataset.

If only x is given, the statistic is computed from the original pooled samples, stacked in matrix where each row is a multivariate observation, or from the distance matrix when distance = TRUE. The first sizes[1] rows of x are the first sample, the next sizes[2] rows of x are the second sample, etc. If x is a list, its elements are taken as the samples to be compared, and hence, this list must contain at least two numeric data vectors, matrices or data.frames.

bd.test utilizes the Ball Divergence statistics (see bd) to measure dispersion and derives a p-value via replicating the random permutation num.permutations times. The function simply returns the test statistic when num.permutations = 0.

The time complexity of bd.test is around O(R \times n^2), where R = num.permutations and n is sample size.

Value

If num.permutations > 0, bd.test returns a htest class object containing the following components:

statistic

Ball Divergence statistic.

p.value

the p-value for the test.

replicates

permutation replications of the test statistic.

size

sample sizes.

complete.info

a list mainly containing two vectors, the first vector is the Ball Divergence statistics with different aggregation strategy and weight, the second vector is the p-values of tests.

alternative

a character string describing the alternative hypothesis.

method

a character string indicating what type of test was performed.

data.name

description of data.

If num.permutations = 0, bd.test returns a statistic value.

Note

Actually, bd.test simultaneously computing "sum", "summax", and "max" Ball Divergence statistics when K \geq 3. Users can get other Ball Divergence statistics and their corresponding p-values in the complete.info element of output. We give a quick example below to illustrate.

Author(s)

Wenliang Pan, Yuan Tian, Xueqin Wang, Heping Zhang, Jin Zhu

References

Wenliang Pan, Yuan Tian, Xueqin Wang, Heping Zhang. Ball Divergence: Nonparametric two sample test. Annals of Statistics. 46 (2018), no. 3, 1109–1137. doi:10.1214/17-AOS1579. https://projecteuclid.org/euclid.aos/1525313077

Jin Zhu, Wenliang Pan, Wei Zheng, and Xueqin Wang (2021). Ball: An R Package for Detecting Distribution Difference and Association in Metric Spaces, Journal of Statistical Software, Vol.97(6), doi: 10.18637/jss.v097.i06.

See Also

bd

Examples

################# Quick Start #################
set.seed(1)
x <- rnorm(50)
y <- rnorm(50, mean = 1)
# plot(density(x))
# lines(density(y), col = "red")
bd.test(x = x, y = y)

################# Quick Start #################
x <- matrix(rnorm(100), nrow = 50, ncol = 2)
y <- matrix(rnorm(100, mean = 3), nrow = 50, ncol = 2)
# Hypothesis test with Standard Ball Divergence:
bd.test(x = x, y = y)

################# Simlated Non-Hilbert data #################
data("bdvmf")
## Not run: 
library(scatterplot3d)
scatterplot3d(bdvmf[["x"]], color = bdvmf[["group"]], 
              xlab = "X1", ylab = "X2", zlab = "X3")

## End(Not run)
# calculate geodesic distance between sample:
Dmat <- nhdist(bdvmf[["x"]], method = "geodesic")
# hypothesis test with BD :
bd.test(x = Dmat, size = c(150, 150), num.permutations = 99, distance = TRUE)

################# Non-Hilbert Real Data #################
# load data:
data("macaques")
# number of femala and male Macaca fascicularis:
table(macaques[["group"]])
# calculate Riemannian shape distance matrix:
Dmat <- nhdist(macaques[["x"]], method = "riemann")
# hypothesis test with BD:
bd.test(x = Dmat, num.permutations = 99, size = c(9, 9), distance = TRUE)

################  K-sample Test  #################
n <- 150
bd.test(rnorm(n), size = c(40, 50, 60))
# alternative input method:
x <- lapply(c(40, 50, 60), rnorm)
res <- bd.test(x)
res
## get all Ball Divergence statistics:
res[["complete.info"]][["statistic"]]
## get all test result:
res[["complete.info"]][["p.value"]]

################  Testing via approximate limit distribution  #################
## Not run: 
set.seed(1)
n <- 1000
x <- rnorm(n)
y <- rnorm(n)
res <- bd.test(x, y, method = "limit")
bd.test(x, y)

## End(Not run)

################  Formula interface  ################
## Two-sample test
bd.test(extra ~ group, data = sleep)
## K-sample test
bd.test(Sepal.Width ~ Species, data = iris)
bd.test(Sepal.Width ~ Species, data = iris, kbd.type = "max")

[Package Ball version 1.3.13 Index]