R: Mode estimation

mix_mode {BayesMultiMode}

R Documentation

Mode estimation

Description

Mode estimation in univariate mixture distributions. The fixed-point algorithm of Carreira-Perpinan (2000) is used for Gaussian mixtures. The Modal EM algorithm of Li et al. (2007) is used for other continuous mixtures. A basic algorithm is used for discrete mixtures, see Cross et al. (2024).

Usage

mix_mode(
  mixture,
  tol_mixp = 0,
  tol_x = 1e-06,
  tol_conv = 1e-08,
  type = "all",
  inside_range = TRUE
)

Arguments

`mixture`	An object of class `mixture` generated with `mixture()`.
`tol_mixp`	Components with a mixture proportion below `tol_mixp` are discarded when estimating modes; note that this does not apply to the biggest component so that it is not possible to discard all components; should be between `0` and `1`; default is `0`.
`tol_x`	(for continuous mixtures) Tolerance parameter for distance in-between modes; default is `1e-6`; if two modes are closer than `tol_x` the first estimated mode is kept.
`tol_conv`	(for continuous mixtures) Tolerance parameter for convergence of the algorithm; default is `1e-8`.
`type`	(for discrete mixtures) Type of modes, either `"unique"` or `"all"` (the latter includes flat modes); default is `"all"`.
`inside_range`	Should modes outside of `mixture$range` be discarded? Default is `TRUE`. This sometimes occurs with very small components when K is large.

Details

This function finds modes in a univariate mixture defined as:

p(.) = \sum_{k=1}^{K}\pi_k p_k(.),

where p_k is a density or probability mass/density function.

Fixed-point algorithm Following Carreira-Perpinan (2000), a mode x is found by iterating the two steps:

(i) \quad p(k|x^{(n)}) = \frac{\pi_k p_k(x^{(n)})}{p(x^{(n)})},

(ii) \quad x^{(n+1)} = f(x^{(n)}),

with

f(x) = (\sum_k p(k|x) \sigma_k)^{-1}\sum_k p(k|x) \sigma_k \mu_k,

until convergence, that is, until abs(x^{(n+1)}-x^{(n)})< \text{tol}_\text{conv}, where \text{tol}_\text{conv} is an argument with default value 1e-8. Following Carreira-perpinan (2000), the algorithm is started at each component location. Separately, it is necessary to identify identical modes which diverge only up to a small value; this tolerance value can be controlled with the argument tol_x.

MEM algorithm Following Li et al. (2007), a mode x is found by iterating the two steps:

(i) \quad p(k|x^{(n)}) = \frac{\pi_k p_k(x^{(n)})}{p(x^{(n)})},

(ii) \quad x^{(n+1)} = \text{argmax}_x \sum_k p(k|x) \text{log} p_k(x^{(n)}),

until convergence, that is, until abs(x^{(n+1)}-x^{(n)})< \text{tol}_\text{conv}, where \text{tol}_\text{conv} is an argument with default value 1e-8. The algorithm is started at each component location. Separately, it is necessary to identify identical modes which diverge only up to a small value. Modes which are closer then tol_x are merged.

Discrete method By definition, modes must satisfy either:

p(y_{m}-1) < p(y_{m}) > p(y_{m}+1);

p(y_{m}-1) < p(y_{m}) = p(y_{m}+1) = \ldots = p(y_{m}+l-1) > p(y_{m}+l).

The algorithm evaluate each location point with these two conditions.

Value

A list of class mix_mode containing:

`mode_estimates`	estimates of the mixture modes.
`algo`	algorithm used for mode estimation.
`dist`	from `mixture`.
`dist_type`	type of mixture distribution, i.e. continuous or discrete.
`pars`	from `mixture`.
`pdf_func`	from `mixture`.
`K`	from `mixture`.
`nb_var`	from `mixture`.

References

Cross JL, Hoogerheide L, Labonne P, van Dijk HK (2024). “Bayesian mode inference for discrete distributions in economics and finance.” Economics Letters, 235, 111579. ISSN 0165-1765, doi:10.1016/j.econlet.2024.111579.

Carreira-Perpinan MA (2000). “Mode-finding for mixtures of Gaussian distributions.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1318–1323. ISSN 1939-3539, doi:10.1109/34.888716, Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.

Cross JL, Hoogerheide L, Labonne P, van Dijk HK (2024). “Bayesian mode inference for discrete distributions in economics and finance.” Economics Letters, 235, 111579. ISSN 0165-1765, doi:10.1016/j.econlet.2024.111579.

Li J, Ray S, Lindsay BG (2007). “A Nonparametric Statistical Approach to Clustering via Mode Identification.” Journal of Machine Learning Research, 8, 1687-1723.

Examples


# Example with a normal distribution ====================================
mu = c(0,5)
sigma = c(1,2)
p = c(0.5,0.5)

params = c(eta = p, mu = mu, sigma = sigma)
mix = mixture(params, dist = "normal", range = c(-5,15))
modes = mix_mode(mix)

# summary(modes)
# plot(modes)

# Example with a skew normal =============================================
xi = c(0,6)
omega = c(1,2)
alpha = c(0,0)
p = c(0.8,0.2)
params = c(eta = p, xi = xi, omega = omega, alpha = alpha)
dist = "skew_normal"

mix = mixture(params, dist = dist, range = c(-5,15))
modes = mix_mode(mix)
# summary(modes)
# plot(modes)

# Example with an arbitrary continuous distribution ======================
xi = c(0,6)
omega = c(1,2)
alpha = c(0,0)
nu = c(3,100)
p = c(0.8,0.2)
params = c(eta = p, mu = xi, sigma = omega, xi = alpha, nu = nu)

pdf_func <- function(x, pars) {
  sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"])
}

mix = mixture(params, pdf_func = pdf_func,
dist_type = "continuous", loc = "mu", range = c(-5,15))
modes = mix_mode(mix)

# summary(modes)
# plot(modes, from = -4, to = 4)

# Example with a poisson distribution ====================================
lambda = c(0.1,10)
p = c(0.5,0.5)
params = c(eta = p, lambda = lambda)
dist = "poisson"


mix = mixture(params, range = c(0,50), dist = dist)

modes = mix_mode(mix)

# summary(modes)
# plot(modes)

# Example with an arbitrary discrete distribution =======================
mu = c(20,5)
size = c(20,0.5)
p = c(0.5,0.5)
params = c(eta = p, mu = mu, size = size)


pmf_func <- function(x, pars) {
  dnbinom(x, mu = pars["mu"], size = pars["size"])
}

mix = mixture(params, range = c(0, 50),
pdf_func = pmf_func, dist_type = "discrete")
modes = mix_mode(mix)

# summary(modes)
# plot(modes)

[Package BayesMultiMode version 0.7.1 Index]