convergence_rate {kldest}R Documentation

Empirical convergence rate of a KL divergence estimator


Subsampling-based confidence intervals computed by kld_ci_subsampling() require the convergence rate of the KL divergence estimator as an input. The default rate of 0.5 assumes that the variance term dominates the bias term. For high-dimensional problems, depending on the data, the convergence rate might be lower. This function allows to empirically derive the convergence rate.


  Y = NULL,
  q = NULL,
  n.sizes = 4,
  spacing.factor = 1.5,
  typical.subsample = function(n) sqrt(n),
  B = 500L,
  plot = FALSE



A KL divergence estimator.

X, Y

n-by-d and m-by-d data frames or matrices (multivariate samples), or numeric/character vectors (univariate samples, i.e. d = 1), representing n samples from the true distribution PP and m samples from the approximate distribution QQ in d dimensions. Y can be left blank if q is specified (see below).


The density function of the approximate distribution QQ. Either Y or q must be specified. If the distributions are all continuous or all discrete, q can be directly specified as the probability density/mass function. However, for mixed continuous/discrete distributions, q must be given in decomposed form, q(yc,yd)=qcd(ycyd)qd(yd)q(y_c,y_d)=q_{c|d}(y_c|y_d)q_d(y_d), specified as a named list with field cond for the conditional density qcd(ycyd)q_{c|d}(y_c|y_d) (a function that expects two arguments y_c and y_d) and disc for the discrete marginal density qd(yd)q_d(y_d) (a function that expects one argument y_d). If such a decomposition is not available, it may be preferable to instead simulate a large sample from QQ and use the two-sample syntax.


Number of different subsample sizes to use (default: 4).


Multiplicative factor controlling the spacing of sample sizes (default: 1.5).


A function that produces a typical subsample size, used as the geometric mean of subsample sizes (default: sqrt(n)).


Number of subsamples to draw per subsample size.


A boolean (default: FALSE) controlling whether to produce a diagnostic plot visualizing the fit.



Politis, Romano and Wolf, "Subsampling", Chapter 8 (1999), for theory.

The implementation has been adapted from lecture notes by C. J. Geyer,


A scalar, the parameter β\beta in the empirical convergence rate nβn^-\beta of the estimator to the true KL divergence. It can be used in the convergence.rate argument of kld_ci_subsampling() as convergence.rate = function(n) n^beta.


    # NN method usually has a convergence rate around 0.5:
    convergence_rate(kld_est_nn, X = rnorm(1000), Y = rnorm(1000, mean = 1, sd = 2))

[Package kldest version 1.0.0 Index]