R: Discretize Density

discretize_density {latent2likert}

R Documentation

Discretize Density

Description

Transforms the density function of a continuous random variable into a discrete probability distribution with minimal distortion using the Lloyd-Max algorithm.

Usage

discretize_density(density_fn, n_levels, eps = 1e-06)

Arguments

`density_fn`	probability density function.
`n_levels`	cardinality of the set of all possible outcomes.
`eps`	convergence threshold for the algorithm.

Details

The function addresses the problem of transforming a continuous random variable X into a discrete random variable Y with minimal distortion. Distortion is measured as mean-squared error (MSE):

\text{E}\left[ (X - Y)^2 \right] = \sum_{k=1}^{K} \int_{x_{k-1}}^{x_{k}} f_{X}(x) \left( x - r_{k} \right)^2 \, dx

where:

f_{X}: is the probability density function of X,
K: is the number of possible outcomes of Y,
x_{k}: are endpoints of intervals that partition the domain of X,
r_{k}: are representation points of the intervals.

This problem is solved using the following iterative procedure:

1.: Start with an arbitrary initial set of representation points: r_{1} < r_{2} < \dots < r_{K}.
2.: Repeat the following steps until the improvement in MSE falls below given \varepsilon.
3.: Calculate endpoints as x_{k} = (r_{k+1} + r_{k})/2 for each k = 1, \dots, K-1 and set x_{0} and x_{K} to -\infty and \infty, respectively.
4.: Update representation points by setting r_{k} equal to the conditional mean of X given X \in (x_{k-1}, x_{k}) for each k = 1, \dots, K.

With each execution of step (3) and step (4), the MSE decreases or remains the same. As MSE is nonnegative, it approaches a limit. The algorithm terminates when the improvement in MSE is less than a given \varepsilon > 0, ensuring convergence after a finite number of iterations.

This procedure is known as Lloyd-Max's algorithm, initially used for scalar quantization and closely related to the k-means algorithm. Local convergence has been proven for log-concave density functions by Kieffer. Many common probability distributions are log-concave including the normal and skew normal distribution, as shown by Azzalini.

Value

A list containing:

prob: discrete probability distribution.
endp: endpoints of intervals that partition the continuous domain.
repr: representation points of the intervals.
dist: distortion measured as the mean-squared error (MSE).

References

Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178.

Kieffer, J. (1983). Uniqueness of locally optimal quantizer for log-concave density and convex error function. IEEE Transactions on Information Theory 29, 42–47.

Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28 (2), 129–137.

Examples

discretize_density(density_fn = stats::dnorm, n_levels = 5)
discretize_density(density_fn = function(x) {
  2 * stats::dnorm(x) * stats::pnorm(0.5 * x)
}, n_levels = 4)

[Package latent2likert version 1.2.1 Index]