mix_mode {BayesMultiMode}R Documentation

Mode estimation

Description

Mode estimation in univariate mixture distributions. The fixed-point algorithm of Carreira-Perpinan (2000) is used for Gaussian mixtures. The Modal EM algorithm of Li et al. (2007) is used for other continuous mixtures. A basic algorithm is used for discrete mixtures, see Cross et al. (2024).

Usage

mix_mode(
  mixture,
  tol_mixp = 0,
  tol_x = 1e-06,
  tol_conv = 1e-08,
  type = "all",
  inside_range = TRUE
)

Arguments

mixture

An object of class mixture generated with mixture().

tol_mixp

Components with a mixture proportion below tol_mixp are discarded when estimating modes; note that this does not apply to the biggest component so that it is not possible to discard all components; should be between 0 and 1; default is 0.

tol_x

(for continuous mixtures) Tolerance parameter for distance in-between modes; default is 1e-6; if two modes are closer than tol_x the first estimated mode is kept.

tol_conv

(for continuous mixtures) Tolerance parameter for convergence of the algorithm; default is 1e-8.

type

(for discrete mixtures) Type of modes, either "unique" or "all" (the latter includes flat modes); default is "all".

inside_range

Should modes outside of mixture$range be discarded? Default is TRUE. This sometimes occurs with very small components when K is large.

Details

This function finds modes in a univariate mixture defined as:

p(.) = \sum_{k=1}^{K}\pi_k p_k(.),

where p_k is a density or probability mass/density function.

Fixed-point algorithm Following Carreira-Perpinan (2000), a mode x is found by iterating the two steps:

(i) \quad p(k|x^{(n)}) = \frac{\pi_k p_k(x^{(n)})}{p(x^{(n)})},

(ii) \quad x^{(n+1)} = f(x^{(n)}),

with

f(x) = (\sum_k p(k|x) \sigma_k)^{-1}\sum_k p(k|x) \sigma_k \mu_k,

until convergence, that is, until abs(x^{(n+1)}-x^{(n)})< \text{tol}_\text{conv}, where \text{tol}_\text{conv} is an argument with default value 1e-8. Following Carreira-perpinan (2000), the algorithm is started at each component location. Separately, it is necessary to identify identical modes which diverge only up to a small value; this tolerance value can be controlled with the argument tol_x.

MEM algorithm Following Li et al. (2007), a mode x is found by iterating the two steps:

(i) \quad p(k|x^{(n)}) = \frac{\pi_k p_k(x^{(n)})}{p(x^{(n)})},

(ii) \quad x^{(n+1)} = \text{argmax}_x \sum_k p(k|x) \text{log} p_k(x^{(n)}),

until convergence, that is, until abs(x^{(n+1)}-x^{(n)})< \text{tol}_\text{conv}, where \text{tol}_\text{conv} is an argument with default value 1e-8. The algorithm is started at each component location. Separately, it is necessary to identify identical modes which diverge only up to a small value. Modes which are closer then tol_x are merged.

Discrete method By definition, modes must satisfy either:

p(y_{m}-1) < p(y_{m}) > p(y_{m}+1);

p(y_{m}-1) < p(y_{m}) = p(y_{m}+1) = \ldots = p(y_{m}+l-1) > p(y_{m}+l).

The algorithm evaluate each location point with these two conditions.

Value

A list of class mix_mode containing:

mode_estimates

estimates of the mixture modes.

algo

algorithm used for mode estimation.

dist

from mixture.

dist_type

type of mixture distribution, i.e. continuous or discrete.

pars

from mixture.

pdf_func

from mixture.

K

from mixture.

nb_var

from mixture.

References

Cross JL, Hoogerheide L, Labonne P, van Dijk HK (2024). “Bayesian mode inference for discrete distributions in economics and finance.” Economics Letters, 235, 111579. ISSN 0165-1765, doi:10.1016/j.econlet.2024.111579.

Carreira-Perpinan MA (2000). “Mode-finding for mixtures of Gaussian distributions.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1318–1323. ISSN 1939-3539, doi:10.1109/34.888716, Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.

Cross JL, Hoogerheide L, Labonne P, van Dijk HK (2024). “Bayesian mode inference for discrete distributions in economics and finance.” Economics Letters, 235, 111579. ISSN 0165-1765, doi:10.1016/j.econlet.2024.111579.

Li J, Ray S, Lindsay BG (2007). “A Nonparametric Statistical Approach to Clustering via Mode Identification.” Journal of Machine Learning Research, 8, 1687-1723.

Examples


# Example with a normal distribution ====================================
mu = c(0,5)
sigma = c(1,2)
p = c(0.5,0.5)

params = c(eta = p, mu = mu, sigma = sigma)
mix = mixture(params, dist = "normal", range = c(-5,15))
modes = mix_mode(mix)

# summary(modes)
# plot(modes)

# Example with a skew normal =============================================
xi = c(0,6)
omega = c(1,2)
alpha = c(0,0)
p = c(0.8,0.2)
params = c(eta = p, xi = xi, omega = omega, alpha = alpha)
dist = "skew_normal"

mix = mixture(params, dist = dist, range = c(-5,15))
modes = mix_mode(mix)
# summary(modes)
# plot(modes)

# Example with an arbitrary continuous distribution ======================
xi = c(0,6)
omega = c(1,2)
alpha = c(0,0)
nu = c(3,100)
p = c(0.8,0.2)
params = c(eta = p, mu = xi, sigma = omega, xi = alpha, nu = nu)

pdf_func <- function(x, pars) {
  sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"])
}

mix = mixture(params, pdf_func = pdf_func,
dist_type = "continuous", loc = "mu", range = c(-5,15))
modes = mix_mode(mix)

# summary(modes)
# plot(modes, from = -4, to = 4)

# Example with a poisson distribution ====================================
lambda = c(0.1,10)
p = c(0.5,0.5)
params = c(eta = p, lambda = lambda)
dist = "poisson"


mix = mixture(params, range = c(0,50), dist = dist)

modes = mix_mode(mix)

# summary(modes)
# plot(modes)

# Example with an arbitrary discrete distribution =======================
mu = c(20,5)
size = c(20,0.5)
p = c(0.5,0.5)
params = c(eta = p, mu = mu, size = size)


pmf_func <- function(x, pars) {
  dnbinom(x, mu = pars["mu"], size = pars["size"])
}

mix = mixture(params, range = c(0, 50),
pdf_func = pmf_func, dist_type = "discrete")
modes = mix_mode(mix)

# summary(modes)
# plot(modes)


[Package BayesMultiMode version 0.7.1 Index]