kld_est {kldest} | R Documentation |
Kullback-Leibler divergence estimator for discrete, continuous or mixed data.
Description
For two mixed continuous/discrete distributions with densities p
and
q
, and denoting x = (x_\text{c},x_\text{d})
, the Kullback-Leibler
divergence D_{KL}(p||q)
is given as
D_{KL}(p||q) = \sum_{x_d} \int p(x_c,x_d) \log\left(\frac{p(x_c,x_d)}{q(x_c,x_d)}\right)dx_c.
Conditioning on the discrete variables x_d
, this can be re-written as
D_{KL}(p||q) = \sum_{x_d} p(x_d) D_{KL}\big(p(\cdot|x_d)||q(\cdot|x_d)\big) +
D_{KL}\big(p_{x_d}||q_{x_d}\big).
Here, the terms
D_{KL}\big(p(\cdot|x_d)||q(\cdot|x_d)\big)
are approximated via nearest neighbour- or kernel-based density estimates on
the datasets X
and Y
stratified by the discrete variables, and
D_{KL}\big(p_{x_d}||q_{x_d}\big)
is approximated using relative frequencies.
Usage
kld_est(
X,
Y = NULL,
q = NULL,
estimator.continuous = kld_est_nn,
estimator.discrete = kld_est_discrete,
vartype = NULL
)
Arguments
X , Y |
|
q |
The density function of the approximate distribution |
estimator.continuous , estimator.discrete |
KL divergence estimators for
continuous and discrete data, respectively. Both are functions with two
arguments |
vartype |
A length |
Value
A scalar, the estimated Kullback-Leibler divergence \hat D_{KL}(P||Q)
.
Examples
# 2D example, two samples
set.seed(0)
X <- data.frame(cont = rnorm(10),
discr = c(rep('a',4),rep('b',6)))
Y <- data.frame(cont = c(rnorm(5), rnorm(5, sd = 2)),
discr = c(rep('a',5),rep('b',5)))
kld_est(X, Y)
# 2D example, one sample
set.seed(0)
X <- data.frame(cont = rnorm(10),
discr = c(rep(0,4),rep(1,6)))
q <- list(cond = function(xc,xd) dnorm(xc, mean = xd, sd = 1),
disc = function(xd) dbinom(xd, size = 1, prob = 0.5))
kld_est(X, q = q, vartype = c("c","d"))