Approximating the density function of the finite mixture models applied for model-based clustering.
Description
The density function of a G-component finite mixture model can be represented as
g(y∣Ψ)=∑g=1GωgfY(y,Θg),
where Ψ=(Θ1,⋯,ΘG)⊤ with Θg=(ωg,μg,Σg,λg)⊤. Herein, fY(y,Θg) accounts for the density function of random vector Y within each component. In the restricted case, fY(y,Θg) admits the representation given by
Y=dμg+Wλg∣Z0∣+WΣg21Z1,
where μg∈Rd is location vector, λg∈Rd is skewness vector, Σg is a positive definite symmetric dispersion matrix for g=1,⋯,G. Further, W is a positive random variable with mixing density function fW(w∣θg), Z0∼N(0,1), and Z1∼Nd(0,Σg). We note that W, Z0, and Z1 are mutually independent. In the canonical or unrestricted case, fY(y,Θg) admits the representation as
Y=dμg+WΛg∣Z0∣+WΣg21Z1,
where Λg is the skewness matrix and random vector Z0 follows a zero-mean normal random vector truncated to the positive hyperplane Rd whose independent marginals have variance unity. We note that in the unrestricted case Λg is a d×d diagonal matrix whereas in the canonical case, it is a d×q matrix and so, random vector Z0 follows a zero-mean normal random vector truncated to the positive hyperplane Rq.
Usage
dmix(Y, G, weight, model = "restricted", mu, sigma, lambda, family = "constant",
skewness = "FALSE", param = NULL, theta = NULL, tick = NULL, N = 3000, log = "FALSE")
Arguments
Y
an n×d matrix of observations.
G
number of components.
weight
a vector of weight parameters (or mixing proportions).
model
it must be "canonical", "restricted", or "unrestricted". By default model = "restricted".
mu
a list of location vectors of G components.
sigma
a list of dispersion matrices of G components.
lambda
a list of skewness vectors of G components. If model is either "canonical" or "unrestricted", then skewness vector must be given in matrix form of appropriate size.
family
name of mixing distribution. By default family = "constant" that corresponds to the finite mixture of multivariate normal (or skew normal) distribution. Other candidates for family name are: "bs" (for Birnbaum-Saunders), "burriii" (for Burr type iii), "chisq" (for chi-square), "exp" (for exponential), "f" (for Fisher), "gamma" (for gamma), "gig" (for generalized inverse-Gaussian), "igamma" (for inverse-gamma), "igaussian" (for inverse-Gaussian), "lindley" (for Lindley), "loglog" (for log-logistic), "lognorm" (for log-normal), "lomax" (for Lomax), "pstable" (for positive α-stable), "ptstable" (for polynomially tilted α-stable), "rayleigh" (for Rayleigh), and "weibull" (for Weibull).
skewness
a logical statement. By default skewness = "FALSE" which means that a symmetric model is fitted to each component (cluster). If skewness = "FALSE", then a skewed model is fitted to each component.
param
name of the elements of θ as the parameter vector of mixing distribution with density function fW(w∣θ). By default it is NULL.
theta
a list of maximum likelihood estimator for θ (parameter vector of the mixing distribution with density function fW(w∣θ)), across G components. By default it is NULL.
tick
a binary vector whose length depends on type of family. The elements of tick are either 0 or 1. If element of tick is 0, then the corresponding element of θ is not considered in the formula of fW(w∣θ) for computing the required posterior expectations. If element of tick is 1, then the corresponding element of θ is considered in the formula of fW(w∣θ). For instance, if family = "gamma" and either its shape or rate parameter is one, then tick = c(1). This is while, if family = "gamma" and both of the shape and rate parameters are in the formula of fW(w∣θ), then tick = c(1, 1). By default tick = NULL.
N
an integer number for approximating the g(y∣Ψ). By default N=3000.
log
if log = "TRUE", then it returns the log of the density function. By default it is log = "FALSE".
Value
Monte Carlo approximated values of mixture model density function.