kld_est_nn {kldest}R Documentation

k-nearest neighbour KL divergence estimator

Description

This function estimates Kullback-Leibler divergence D_{KL}(P||Q) between two continuous distributions P and Q using nearest-neighbour (NN) density estimation in a Monte Carlo approximation of D_{KL}(P||Q).

Usage

kld_est_nn(X, Y = NULL, q = NULL, k = 1L, eps = 0, log.q = FALSE)

Arguments

X, Y

n-by-d and m-by-d matrices, representing n samples from the true distribution P and m samples from the approximate distribution Q, both in d dimensions. Vector input is treated as a column matrix. Y can be left blank if q is specified (see below).

q

The density function of the approximate distribution Q. Either Y or q must be specified.

k

The number of nearest neighbours to consider for NN density estimation. Larger values for k generally increase bias, but decrease variance of the estimator. Defaults to k = 1.

eps

Error bound in the nearest neighbour search. A value of eps = 0 (the default) implies an exact nearest neighbour search, for eps > 0 approximate nearest neighbours are sought, which may be somewhat faster for high-dimensional problems.

log.q

If TRUE, function q is the log-density rather than the density of the approximate distribution Q (default: log.q = FALSE).

Details

Input for estimation is a sample X from P and either the density function q of Q (one-sample problem) or a sample Y of Q (two-sample problem). In the two-sample problem, it is the estimator in Eq.(5) of Wang et al. (2009). In the one-sample problem, the asymptotic bias (the expectation of a Gamma distribution) is substracted, see Pérez-Cruz (2008), Eq.(18).

References:

Wang, Kulkarni and Verdú, "Divergence Estimation for Multidimensional Densities Via k-Nearest-Neighbor Distances", IEEE Transactions on Information Theory, Vol. 55, No. 5 (2009).

Pérez-Cruz, "Kullback-Leibler Divergence Estimation of Continuous Distributions", IEEE International Symposium on Information Theory (2008).

Value

A scalar, the estimated Kullback-Leibler divergence \hat D_{KL}(P||Q).

Examples

# KL-D between one or two samples from 1-D Gaussians:
set.seed(0)
X <- rnorm(100)
Y <- rnorm(100, mean = 1, sd = 2)
q <- function(x) dnorm(x, mean = 1, sd =2)
kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2)
kld_est_nn(X, Y)
kld_est_nn(X, q = q)
kld_est_nn(X, Y, k = 5)
kld_est_nn(X, q = q, k = 5)
kld_est_brnn(X, Y)


# KL-D between two samples from 2-D Gaussians:
set.seed(0)
X1 <- rnorm(100)
X2 <- rnorm(100)
Y1 <- rnorm(100)
Y2 <- Y1 + rnorm(100)
X <- cbind(X1,X2)
Y <- cbind(Y1,Y2)
kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2),
             mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2))
kld_est_nn(X, Y)
kld_est_nn(X, Y, k = 5)
kld_est_brnn(X, Y)

[Package kldest version 1.0.0 Index]