R: Approximating the density function of the finite mixture...

dmix {mixbox}

R Documentation

Approximating the density function of the finite mixture models applied for model-based clustering.

Description

The density function of a G-component finite mixture model can be represented as

g({\bold{y}}|\Psi)=\sum_{g=1}^{G} \omega_{g} f_{\bold{Y}}({\bold{y}}, \Theta_g),

where \bold{\Psi} = \bigl(\bold{\Theta}_{1},\cdots, \bold{\Theta}_{G}\bigr)^{\top} with \bold{\Theta}_g=\bigl({\bold{\omega}}_g, {\bold{\mu}}_g, {{\Sigma}}_g, {\bold{\lambda}}_g\bigr)^{\top}. Herein, f_{\bold{Y}}(\bold{y}, \bold{\Theta}_g) accounts for the density function of random vector \bold{Y} within each component. In the restricted case, f_{\bold{Y}}(\bold{y}, \bold{\Theta}_g) admits the representation given by

{\bold{Y}} \mathop=\limits^d {\bold{\mu}}_{g}+\sqrt{W}{\bold{\lambda}}_{g}\vert{Z}_0\vert + \sqrt{W}{\Sigma}_{g}^{\frac{1}{2}} {\bold{Z}}_1,

where {\bold{\mu}}_{g} \in {R}^{d} is location vector, {\bold{\lambda}}_{g} \in {R}^{d} is skewness vector, \Sigma_{g} is a positive definite symmetric dispersion matrix for g=1,\cdots,G. Further, W is a positive random variable with mixing density function f_W(w| \bold{\theta}_{g}), {Z}_0\sim N(0, 1) , and {\bold{Z}}_1\sim N_{d}\bigl( {\bold{0}}, \Sigma_{g}\bigr) . We note that W, Z_0, and {\bold{Z}}_1 are mutually independent. In the canonical or unrestricted case, f_{\bold{Y}}(\bold{y}, \bold{\Theta}_g) admits the representation as

{\bold{Y}} \mathop=\limits^d {\bold{\mu}}_{g}+\sqrt{W}{\bold{\Lambda}}_{g} \vert\bold{Z}_0\vert + \sqrt{W}{\Sigma}_{g}^{\frac{1}{2}} {\bold{Z}}_1,

where \bold{\Lambda}_{g} is the skewness matrix and random vector \bold{Z}_0 follows a zero-mean normal random vector truncated to the positive hyperplane R^{d} whose independent marginals have variance unity. We note that in the unrestricted case \bold{\Lambda}_{g} is a d \times d diagonal matrix whereas in the canonical case, it is a d\times q matrix and so, random vector \bold{Z}_0 follows a zero-mean normal random vector truncated to the positive hyperplane R^{q}.

Usage

dmix(Y, G, weight, model = "restricted", mu, sigma, lambda, family = "constant",
skewness = "FALSE", param = NULL, theta = NULL, tick = NULL, N = 3000, log = "FALSE")

Arguments

`Y`	an `n\times d` matrix of observations.
`G`	number of components.
`weight`	a vector of weight parameters (or mixing proportions).
`model`	it must be `"canonical"`, `"restricted"`, or `"unrestricted"`. By default `model = "restricted"`.
`mu`	a list of location vectors of `G` components.
`sigma`	a list of dispersion matrices of `G` components.
`lambda`	a list of skewness vectors of `G` components. If model is either `"canonical"` or `"unrestricted"`, then skewness vector must be given in matrix form of appropriate size.
`family`	name of mixing distribution. By default `family = "constant"` that corresponds to the finite mixture of multivariate normal (or skew normal) distribution. Other candidates for family name are: "bs" (for Birnbaum-Saunders), "burriii" (for Burr type iii), "chisq" (for chi-square), "exp" (for exponential), "f" (for Fisher), "gamma" (for gamma), "gig" (for generalized inverse-Gaussian), "igamma" (for inverse-gamma), "igaussian" (for inverse-Gaussian), "lindley" (for Lindley), "loglog" (for log-logistic), "lognorm" (for log-normal), "lomax" (for Lomax), "pstable" (for positive `\alpha`-stable), "ptstable" (for polynomially tilted `\alpha`-stable), "rayleigh" (for Rayleigh), and "weibull" (for Weibull).
`skewness`	a logical statement. By default `skewness = "FALSE"` which means that a symmetric model is fitted to each component (cluster). If `skewness = "FALSE"`, then a skewed model is fitted to each component.
`param`	name of the elements of `\bold{\theta}` as the parameter vector of mixing distribution with density function `f_W(w\| \bold{\theta})`. By default it is `NULL`.
`theta`	a list of maximum likelihood estimator for `\bold{\theta}` (parameter vector of the mixing distribution with density function `f_W(w\| \bold{\theta})`), across `G` components. By default it is `NULL`.
`tick`	a binary vector whose length depends on type of family. The elements of `tick` are either `0` or `1`. If element of `tick` is `0`, then the corresponding element of `\bold{\theta}` is not considered in the formula of `f_W(w\|{\bold{\theta)}}` for computing the required posterior expectations. If element of `tick` is `1`, then the corresponding element of `\bold{\theta}` is considered in the formula of `f_W(w\|{\bold{\theta)}}`. For instance, if `family = "gamma"` and either its shape or rate parameter is one, then `tick = c(1)`. This is while, if `family = "gamma"` and both of the shape and rate parameters are in the formula of `f_W(w\|{\bold{\theta)}}`, then `tick = c(1, 1)`. By default `tick = NULL`.
`N`	an integer number for approximating the `g({\bold{y}}\|\Psi)` . By default `N = 3000`.
`log`	if `log = "TRUE"`, then it returns the log of the density function. By default it is `log = "FALSE"`.

Value

Monte Carlo approximated values of mixture model density function.

Author(s)

Mahdi Teimouri

Examples


      Y <- c(1, 2)
      G <- 2
 weight <- rep( 0.5, 2 )
    mu1 <- rep(  -5, 2 )
    mu2 <- rep(   5, 2 )
 sigma1 <- matrix( c( 0.4, -0.20, -0.20, 0.5 ), nrow = 2, ncol = 2 )
 sigma2 <- matrix( c( 0.5,  0.20,  0.20, 0.4 ), nrow = 2, ncol = 2 )
lambda1 <- c( 5, -5 )
lambda2 <- c(-5,  5 )
     mu <- list( mu1, mu2 )
  sigma <- list( sigma1 , sigma2 )
 lambda <- list( lambda1, lambda2)
    out <- dmix(Y, G, weight, model = "restricted", mu, sigma, lambda, family =
           "constant", skewness = "TRUE", param = NULL, theta = NULL, tick =
           NULL, N = 3000)

[Package mixbox version 1.2.3 Index]