bd.test {Ball} | R Documentation |
Ball Divergence based Equality of Distributions Test
Description
Performs the nonparametric two-sample or K
-sample Ball Divergence test for
equality of multivariate distributions
Usage
bd.test(x, ...)
## Default S3 method:
bd.test(
x,
y = NULL,
num.permutations = 99,
method = c("permutation", "limit"),
distance = FALSE,
size = NULL,
seed = 1,
num.threads = 0,
kbd.type = c("sum", "maxsum", "max"),
weight = c("constant", "variance"),
...
)
## S3 method for class 'formula'
bd.test(formula, data, subset, na.action, ...)
Arguments
x |
a numeric vector, matrix, data.frame, or a list containing at least two numeric vectors, matrices, or data.frames. |
... |
further arguments to be passed to or from methods. |
y |
a numeric vector, matrix, data.frame. |
num.permutations |
the number of permutation replications. When |
method |
if |
distance |
if |
size |
a vector recording sample size of each group. |
seed |
the random seed. Default |
num.threads |
number of threads. If |
kbd.type |
a character string specifying the |
weight |
a character string specifying the weight form of Ball Divergence statistic.
It must be one of |
formula |
a formula of the form |
data |
an optional matrix or data frame (or similar: see |
subset |
an optional vector specifying a subset of observations to be used. |
na.action |
a function which indicates what should happen when the data contain |
Details
bd.test
is nonparametric test for the two-sample or K
-sample problem.
It can detect distribution difference between K(K \geq 2)
sample even though sample size are imbalanced.
This test can cope well multivariate dataset or complex dataset.
If only x
is given, the statistic is
computed from the original pooled samples, stacked in
matrix where each row is a multivariate observation, or from the distance matrix
when distance = TRUE
. The first sizes[1]
rows of x
are the first sample, the next
sizes[2]
rows of x
are the second sample, etc.
If x
is a list
, its elements are taken as the samples to be compared,
and hence, this list
must contain at least two numeric data vectors, matrices or data.frames.
bd.test
utilizes the Ball Divergence statistics (see bd
) to measure dispersion and
derives a p
-value via replicating the random permutation num.permutations
times.
The function simply returns the test statistic
when num.permutations = 0
.
The time complexity of bd.test
is around O(R \times n^2)
,
where R
= num.permutations
and n
is sample size.
Value
If num.permutations > 0
, bd.test
returns a htest
class object containing the following components:
statistic |
Ball Divergence statistic. |
p.value |
the |
replicates |
permutation replications of the test statistic. |
size |
sample sizes. |
complete.info |
a |
alternative |
a character string describing the alternative hypothesis. |
method |
a character string indicating what type of test was performed. |
data.name |
description of data. |
If num.permutations = 0
, bd.test
returns a statistic value.
Note
Actually, bd.test
simultaneously computing "sum"
, "summax"
, and "max"
Ball Divergence statistics
when K \geq 3
.
Users can get other Ball Divergence statistics and their corresponding p
-values
in the complete.info
element of output. We give a quick example below to illustrate.
Author(s)
Wenliang Pan, Yuan Tian, Xueqin Wang, Heping Zhang, Jin Zhu
References
Wenliang Pan, Yuan Tian, Xueqin Wang, Heping Zhang. Ball Divergence: Nonparametric two sample test. Annals of Statistics. 46 (2018), no. 3, 1109–1137. doi:10.1214/17-AOS1579. https://projecteuclid.org/euclid.aos/1525313077
Jin Zhu, Wenliang Pan, Wei Zheng, and Xueqin Wang (2021). Ball: An R Package for Detecting Distribution Difference and Association in Metric Spaces, Journal of Statistical Software, Vol.97(6), doi: 10.18637/jss.v097.i06.
See Also
Examples
################# Quick Start #################
set.seed(1)
x <- rnorm(50)
y <- rnorm(50, mean = 1)
# plot(density(x))
# lines(density(y), col = "red")
bd.test(x = x, y = y)
################# Quick Start #################
x <- matrix(rnorm(100), nrow = 50, ncol = 2)
y <- matrix(rnorm(100, mean = 3), nrow = 50, ncol = 2)
# Hypothesis test with Standard Ball Divergence:
bd.test(x = x, y = y)
################# Simlated Non-Hilbert data #################
data("bdvmf")
## Not run:
library(scatterplot3d)
scatterplot3d(bdvmf[["x"]], color = bdvmf[["group"]],
xlab = "X1", ylab = "X2", zlab = "X3")
## End(Not run)
# calculate geodesic distance between sample:
Dmat <- nhdist(bdvmf[["x"]], method = "geodesic")
# hypothesis test with BD :
bd.test(x = Dmat, size = c(150, 150), num.permutations = 99, distance = TRUE)
################# Non-Hilbert Real Data #################
# load data:
data("macaques")
# number of femala and male Macaca fascicularis:
table(macaques[["group"]])
# calculate Riemannian shape distance matrix:
Dmat <- nhdist(macaques[["x"]], method = "riemann")
# hypothesis test with BD:
bd.test(x = Dmat, num.permutations = 99, size = c(9, 9), distance = TRUE)
################ K-sample Test #################
n <- 150
bd.test(rnorm(n), size = c(40, 50, 60))
# alternative input method:
x <- lapply(c(40, 50, 60), rnorm)
res <- bd.test(x)
res
## get all Ball Divergence statistics:
res[["complete.info"]][["statistic"]]
## get all test result:
res[["complete.info"]][["p.value"]]
################ Testing via approximate limit distribution #################
## Not run:
set.seed(1)
n <- 1000
x <- rnorm(n)
y <- rnorm(n)
res <- bd.test(x, y, method = "limit")
bd.test(x, y)
## End(Not run)
################ Formula interface ################
## Two-sample test
bd.test(extra ~ group, data = sleep)
## K-sample test
bd.test(Sepal.Width ~ Species, data = iris)
bd.test(Sepal.Width ~ Species, data = iris, kbd.type = "max")