kld_est_brnn {kldest} | R Documentation |
Bias-reduced generalized k-nearest-neighbour KL divergence estimation
Description
This is the bias-reduced generalized k-NN based KL divergence estimator from Wang et al. (2009) specified in Eq.(29).
Usage
kld_est_brnn(X, Y, max.k = 100, warn.max.k = TRUE, eps = 0)
Arguments
X , Y |
|
max.k |
Maximum numbers of nearest neighbours to compute (default: |
warn.max.k |
If |
eps |
Error bound in the nearest neighbour search. A value of |
Details
Finite sample bias reduction is achieved by an adaptive choice of the number
of nearest neighbours. Fixing the number of nearest neighbours upfront, as
done in kld_est_nn()
, may result in very different distances
\rho^l_i,\nu^k_i
of a datapoint x_i
to its l
-th nearest
neighbours in X
and k
-th nearest neighbours in Y
,
respectively, which may lead to unequal biases in NN density estimation,
especially in a high-dimensional setting.
To overcome this issue, the number of neighbours l,k
are here chosen
in a way to render \rho^l_i,\nu^k_i
comparable, by taking the largest
possible number of neighbours l_i,k_i
smaller than
\delta_i:=\max(\rho^1_i,\nu^1_i)
.
Since the bias reduction explicitly uses both samples X
and Y
, one-sample
estimation is not possible using this method.
Reference: Wang, Kulkarni and VerdĂș, "Divergence Estimation for Multidimensional Densities Via k-Nearest-Neighbor Distances", IEEE Transactions on Information Theory, Vol. 55, No. 5 (2009). DOI: https://doi.org/10.1109/TIT.2009.2016060
Value
A scalar, the estimated Kullback-Leibler divergence \hat D_{KL}(P||Q)
.
Examples
# KL-D between one or two samples from 1-D Gaussians:
set.seed(0)
X <- rnorm(100)
Y <- rnorm(100, mean = 1, sd = 2)
q <- function(x) dnorm(x, mean = 1, sd =2)
kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2)
kld_est_nn(X, Y)
kld_est_nn(X, q = q)
kld_est_nn(X, Y, k = 5)
kld_est_nn(X, q = q, k = 5)
kld_est_brnn(X, Y)
# KL-D between two samples from 2-D Gaussians:
set.seed(0)
X1 <- rnorm(100)
X2 <- rnorm(100)
Y1 <- rnorm(100)
Y2 <- Y1 + rnorm(100)
X <- cbind(X1,X2)
Y <- cbind(Y1,Y2)
kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2),
mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2))
kld_est_nn(X, Y)
kld_est_nn(X, Y, k = 5)
kld_est_brnn(X, Y)