kld_ci_subsampling {kldest} | R Documentation |
Uncertainty of KL divergence estimate using Politis/Romano's subsampling bootstrap.
Description
This function computes a confidence interval for KL divergence based on the subsampling bootstrap introduced by Politis and Romano. See Details for theoretical properties of this method.
Usage
kld_ci_subsampling(
X,
Y = NULL,
q = NULL,
estimator = kld_est_nn,
B = 500L,
alpha = 0.05,
subsample.size = function(x) x^(2/3),
convergence.rate = sqrt,
method = c("quantile", "se"),
include.boot = FALSE,
n.cores = 1L,
...
)
Arguments
X , Y |
|
q |
The density function of the approximate distribution |
estimator |
The Kullback-Leibler divergence estimation method; a
function expecting two inputs ( |
B |
Number of bootstrap replicates (default: |
alpha |
Error level, defaults to |
subsample.size |
A function specifying the size of the subsamples,
defaults to |
convergence.rate |
A function computing the convergence rate of the
estimator as a function of sample sizes. Defaults to |
method |
Either |
include.boot |
Boolean, |
n.cores |
Number of cores to use in parallel computing (defaults to |
... |
Arguments passed on to |
Details
In general terms, tetting b_n
be the subsample size for a sample of
size n
, and \tau_n
the convergence rate of the estimator, a
confidence interval calculated by subsampling has asymptotic coverage
1 - \alpha
as long as b_n/n\rightarrow 0
,
b_n\rightarrow\infty
and \frac{\tau_{b_n}}{\tau_n}\rightarrow 0
.
In many cases, the convergence rate of the nearest-neighbour based KL
divergence estimator is \tau_n = \sqrt{n}
and the condition on the
subsample size reduces to b_n/n\rightarrow 0
and b_n\rightarrow\infty
.
By default, b_n = n^{2/3}
. In a two-sample problem, n
and b_n
are replaced by effective sample sizes n_\text{eff} = \min(n,m)
and
b_{n,\text{eff}} = \min(b_n,b_m)
.
Reference:
Politis and Romano, "Large sample confidence regions based on subsamples under minimal assumptions", The Annals of Statistics, Vol. 22, No. 4 (1994).
Value
A list with the following fields:
-
"est"
(the estimated KL divergence), -
"ci"
(a length2
vector containing the lower and upper limits of the estimated confidence interval). -
"boot"
(a lengthB
numeric vector with KL divergence estimates on the bootstrap subsamples), only included ifinclude.boot = TRUE
,
Examples
# 1D Gaussian (one- and two-sample problems)
set.seed(0)
X <- rnorm(100)
Y <- rnorm(100, mean = 1, sd = 2)
q <- function(x) dnorm(x, mean =1, sd = 2)
kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2)
kld_est_nn(X, Y = Y)
kld_est_nn(X, q = q)
kld_ci_subsampling(X, Y)$ci
kld_ci_subsampling(X, q = q)$ci