bd {Ball} | R Documentation |
Ball Divergence statistic
Description
Compute Ball Divergence statistic, which is a generic dispersion measure in Banach spaces.
Usage
bd(
x,
y = NULL,
distance = FALSE,
size = NULL,
num.threads = 1,
kbd.type = c("sum", "maxsum", "max")
)
Arguments
x |
a numeric vector, matrix, data.frame, or a list containing at least two numeric vectors, matrices, or data.frames. |
y |
a numeric vector, matrix, data.frame. |
distance |
if |
size |
a vector recording sample size of each group. |
num.threads |
number of threads. If |
kbd.type |
a character string specifying the |
Details
Given the samples not containing missing values, bd
returns Ball Divergence statistics.
If we set distance = TRUE
, arguments x
, y
can be a dist
object or a
symmetric numeric matrix recording distance between samples;
otherwise, these arguments are treated as data.
Ball divergence statistic measure the distribution difference of two datasets in Banach spaces. The Ball divergence statistic is proven to be zero if and only if two datasets are identical.
The definition of the Ball Divergence statistics is as follows.
Given two independent samples \{x_{1}, \ldots, x_{n}\}
with the associated probability measure \mu
and
\{y_{1}, \ldots, y_{m}\}
with \nu
, where the observations in each sample are i.i.d.
Let \delta(x,y,z)=I(z\in \bar{B}(x, \rho(x,y)))
,
where \delta(x,y,z)
indicates whether z
is located in the closed ball \bar{B}(x, \rho(x,y))
with center x
and radius \rho(x, y)
.
We denote:
A_{ij}^{X}=\frac{1}{n}\sum_{u=1}^{n}{\delta(X_i,X_j,X_u)}, \quad A_{ij}^{Y}=\frac{1}{m}\sum_{v=1}^{m}{\delta(X_i,X_j,Y_v)},
C_{kl}^{X}=\frac{1}{n}\sum_{u=1}^{n}{\delta(Y_k,Y_l,X_u)}, \quad C_{kl}^{Y}=\frac{1}{m}\sum_{v=1}^{m}{\delta(Y_k,Y_l,Y_v)}.
A_{ij}^X
represents the proportion of samples \{x_{1}, \ldots, x_{n}\}
located in the
ball \bar{B}(X_i,\rho(X_i,X_j))
and A_{ij}^Y
represents the proportion of samples \{y_{1}, \ldots, y_{m}\}
located in the ball \bar{B}(X_i,\rho(X_i,X_j))
.
Meanwhile, C_{kl}^X
and C_{kl}^Y
represent the corresponding proportions located in the ball \bar{B}(Y_k,\rho(Y_k,Y_l))
.
The Ball Divergence statistic is defined as:
D_{n,m}=A_{n,m}+C_{n,m}
Ball Divergence can be generalized to the K-sample test problem. Suppose we
have K
group samples, each group include n_{k}
samples.
The definition of K
-sample Ball Divergence statistic could be
to directly sum up the two-sample Ball Divergence statistics of all sample pairs (kbd.type = "sum"
)
\sum_{1 \leq k < l \leq K}{D_{n_{k},n_{l}}},
or to find one sample with the largest difference to the others (kbd.type = "maxsum"
)
\max_{t}{\sum_{s=1, s \neq t}^{K}{D_{n_{s}, n_{t}}},}
to aggregate the K-1
most significant different two-sample Ball Divergence statistics (kbd.type = "max"
)
\sum_{k=1}^{K-1}{D_{(k)}},
where D_{(1)}, \ldots, D_{(K-1)}
are the largest K-1
two-sample Ball Divergence statistics among
\{D_{n_s, n_t}| 1 \leq s < t \leq K\}
. When K=2
,
the three types of Ball Divergence statistics degenerate into two-sample Ball Divergence statistic.
See bd.test
for a test of distribution equality based on the Ball Divergence.
Value
bd |
Ball Divergence statistic |
Author(s)
Wenliang Pan, Yuan Tian, Xueqin Wang, Heping Zhang
References
Wenliang Pan, Yuan Tian, Xueqin Wang, Heping Zhang. Ball Divergence: Nonparametric two sample test. Ann. Statist. 46 (2018), no. 3, 1109–1137. doi:10.1214/17-AOS1579. https://projecteuclid.org/euclid.aos/1525313077
See Also
Examples
############# Ball Divergence #############
x <- rnorm(50)
y <- rnorm(50)
bd(x, y)