R: Density, Distribution and Quantile of Finite Mixture...

dfmx {fmx}

R Documentation

Density, Distribution and Quantile of Finite Mixture Distribution

Description

Density function, distribution function, quantile function and random generation for a finite mixture distribution with normal or Tukey g-&-h components.

Usage

dfmx(
  x,
  dist,
  distname = dist@distname,
  K = dim(pars)[1L],
  pars = dist@pars,
  w = dist@w,
  ...,
  log = FALSE
)

pfmx(
  q,
  dist,
  distname = dist@distname,
  K = dim(pars)[1L],
  pars = dist@pars,
  w = dist@w,
  ...,
  lower.tail = TRUE,
  log.p = FALSE
)

qfmx(
  p,
  dist,
  distname = dist@distname,
  K = dim(pars)[1L],
  pars = dist@pars,
  w = dist@w,
  interval = qfmx_interval(dist = dist),
  ...,
  lower.tail = TRUE,
  log.p = FALSE
)

rfmx(
  n,
  dist,
  distname = dist@distname,
  K = dim(pars)[1L],
  pars = dist@pars,
  w = dist@w
)

Arguments

`x`, `q`	numeric vector, quantiles, `NA_real_` value(s) allowed.
`dist`	fmx object, a finite mixture distribution
`distname`, `K`, `pars`, `w`	auxiliary parameters, whose default values are determined by argument `dist`. The user-specified vector of `w` does not need to sum up to 1; `w/sum(w)` will be used internally.
`...`	additional parameters
`log`, `log.p`	logical scalar. If `TRUE`, probabilities are given as `\log(p)`.
`lower.tail`	logical scalar. If `TRUE` (default), probabilities are `Pr(X\le x)`, otherwise, `Pr(X>x)`.
`p`	numeric vector, probabilities.
`interval`	length two numeric vector, interval for root finding, see vuniroot2 and vuniroot
`n`	integer scalar, number of observations.

Details

A computational challenge in function dfmx is when mixture density is very close to 0, which happens when the per-component log densities are negative with big absolute values. In such case, we cannot compute the log mixture densities (i.e., -Inf), for the log-likelihood using function logLik.fmx. Our solution is to replace these -Inf log mixture densities by the weighted average (using the mixing proportions of dist) of the per-component log densities.

Function qfmx gives the quantile function, by numerically solving pfmx. One major challenge when dealing with the finite mixture of Tukey g-&-h family distribution is that Brent–Dekker's method needs to be performed in both pGH and qfmx functions, i.e. two layers of root-finding algorithm.

Value

Function dfmx returns a numeric vector of probability density values of an fmx object at specified quantiles x.

Function pfmx returns a numeric vector of cumulative probability values of an fmx object at specified quantiles q.

Function qfmx returns an unnamed numeric vector of quantiles of an fmx object, based on specified cumulative probabilities p.

Function rfmx generates random deviates of an fmx object.

Note

Function qnorm returns an unnamed vector of quantiles, although quantile returns a named vector of quantiles.

Examples

library(ggplot2)

(e1 = fmx('norm', mean = c(0,3), sd = c(1,1.3), w = c(1, 1)))
curve(dfmx(x, dist = e1), xlim = c(-3,7))
ggplot() + geom_function(fun = dfmx, args = list(dist = e1)) + xlim(-3,7)
ggplot() + geom_function(fun = pfmx, args = list(dist = e1)) + xlim(-3,7)
hist(rfmx(n = 1e3L, dist = e1), main = '1000 obs from e1')

x = (-3):7
round(dfmx(x, dist = e1), digits = 3L)
round(p1 <- pfmx(x, dist = e1), digits = 3L)
stopifnot(all.equal.numeric(qfmx(p1, dist = e1), x, tol = 1e-4))

(e2 = fmx('GH', A = c(0,3), g = c(.2, .3), h = c(.2, .1), w = c(2, 3)))
ggplot() + geom_function(fun = dfmx, args = list(dist = e2)) + xlim(-3,7)

round(dfmx(x, dist = e2), digits = 3L)
round(p2 <- pfmx(x, dist = e2), digits = 3L)
stopifnot(all.equal.numeric(qfmx(p2, dist = e2), x, tol = 1e-4))

(e3 = fmx('GH', g = .2, h = .01)) # one-component Tukey
ggplot() + geom_function(fun = dfmx, args = list(dist = e3)) + xlim(-3,5)
set.seed(124); r1 = rfmx(1e3L, dist = e3); 
set.seed(124); r2 = TukeyGH77::rGH(n = 1e3L, g = .2, h = .01)
stopifnot(identical(r1, r2)) # but ?rfmx has much cleaner code
round(dfmx(x, dist = e3), digits = 3L)
round(p3 <- pfmx(x, dist = e3), digits = 3L)
stopifnot(all.equal.numeric(qfmx(p3, dist = e3), x, tol = 1e-4))


a1 = fmx('GH', A = c(7,9), B = c(.8, 1.2), g = c(.3, 0), h = c(0, .1), w = c(1, 1))
a2 = fmx('GH', A = c(6,9), B = c(.8, 1.2), g = c(-.3, 0), h = c(.2, .1), w = c(4, 6))
library(ggplot2)
(p = ggplot() + 
 geom_function(fun = pfmx, args = list(dist = a1), mapping = aes(color = 'g2=h1=0')) + 
 geom_function(fun = pfmx, args = list(dist = a2), mapping = aes(color = 'g2=0')) + 
 xlim(3,15) + 
 scale_y_continuous(labels = scales::percent) +
 labs(y = NULL, color = 'models') +
 coord_flip())
p + theme(legend.position = 'none')


# to use [rfmx] without \pkg{fmx}
(d = fmx(distname = 'GH', A = c(-1,1), B = c(.9,1.1), g = c(.3,-.2), h = c(.1,.05), w = c(2,3)))
d@pars
set.seed(14123); x = rfmx(n = 1e3L, dist = d)
set.seed(14123); x_raw = rfmx(n = 1e3L,
 distname = 'GH', K = 2L,
 pars = rbind(
  c(A = -1, B = .9, g = .3, h = .1),
  c(A = 1, B = 1.1, g = -.2, h = .05)
 ), 
 w = c(.4, .6)
)
stopifnot(identical(x, x_raw))

[Package fmx version 0.1.2 Index]