discretize_density {latent2likert} | R Documentation |
Discretize Density
Description
Transforms the density function of a continuous random variable into a discrete probability distribution with minimal distortion using the Lloyd-Max algorithm.
Usage
discretize_density(density_fn, n_levels, eps = 1e-06)
Arguments
density_fn |
probability density function. |
n_levels |
cardinality of the set of all possible outcomes. |
eps |
convergence threshold for the algorithm. |
Details
The function addresses the problem of transforming a continuous random
variable X
into a discrete random variable Y
with minimal
distortion. Distortion is measured as mean-squared error (MSE):
\text{E}\left[ (X - Y)^2 \right] =
\sum_{k=1}^{K} \int_{x_{k-1}}^{x_{k}} f_{X}(x)
\left( x - r_{k} \right)^2 \, dx
where:
f_{X}
is the probability density function of
X
,K
is the number of possible outcomes of
Y
,x_{k}
are endpoints of intervals that partition the domain of
X
,r_{k}
are representation points of the intervals.
This problem is solved using the following iterative procedure:
1.
Start with an arbitrary initial set of representation points:
r_{1} < r_{2} < \dots < r_{K}
.2.
Repeat the following steps until the improvement in MSE falls below given
\varepsilon
.3.
Calculate endpoints as
x_{k} = (r_{k+1} + r_{k})/2
for eachk = 1, \dots, K-1
and setx_{0}
andx_{K}
to-\infty
and\infty
, respectively.4.
Update representation points by setting
r_{k}
equal to the conditional mean ofX
givenX \in (x_{k-1}, x_{k})
for eachk = 1, \dots, K
.
With each execution of step (3)
and step (4)
, the MSE decreases
or remains the same. As MSE is nonnegative, it approaches a limit.
The algorithm terminates when the improvement in MSE is less than a given
\varepsilon > 0
, ensuring convergence after a finite number
of iterations.
This procedure is known as Lloyd-Max's algorithm, initially used for scalar quantization and closely related to the k-means algorithm. Local convergence has been proven for log-concave density functions by Kieffer. Many common probability distributions are log-concave including the normal and skew normal distribution, as shown by Azzalini.
Value
A list containing:
- prob
discrete probability distribution.
- endp
endpoints of intervals that partition the continuous domain.
- repr
representation points of the intervals.
- dist
distortion measured as the mean-squared error (MSE).
References
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178.
Kieffer, J. (1983). Uniqueness of locally optimal quantizer for log-concave density and convex error function. IEEE Transactions on Information Theory 29, 42–47.
Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28 (2), 129–137.
Examples
discretize_density(density_fn = stats::dnorm, n_levels = 5)
discretize_density(density_fn = function(x) {
2 * stats::dnorm(x) * stats::pnorm(0.5 * x)
}, n_levels = 4)