R: Estimating parameters of the well-known mixture models

fitmixture {ForestFit}

R Documentation

Estimating parameters of the well-known mixture models

Description

Estimates parameters of the mixture model using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by

F(x,{\Theta}) = \sum_{j=1}^{K}\omega_j F_j(x,\theta_j),

where \Theta=(\theta_1,\dots,\theta_K)^T, is the whole parameter vector, \theta_j for j=1,\dots,K is the parameter space of the j-th component, i.e. \theta_j=(\alpha_j,\beta_j)^{T}, F_j(.,\theta_j) is the cdf of the j-th component, and known constant K is the number of components. Parameters \alpha and \beta are the shape and scale parameters or both are the shape parameters. In the latter case, the parameters \alpha and \beta are called the first and second shape parameters, respectively. We note that the constants \omega_js sum to one, i.e. \sum_{j=1}^{K}\omega_j=1. The families considered for the cdf F include Birnbaum-Saunders, Burr type XII, Chen, F, Frechet, Gamma, Gompertz, Log-normal, Log-logistic, Lomax, skew-normal, and Weibull.

Usage

fitmixture(data, family, K, initial=FALSE, starts)

Arguments

`data`	Vector of observations.
`family`	Name of the family including: "`birnbaum-saunders`", "`burrxii`", "`chen`", "`f`", "`Frechet`", "`gamma`", "`gompetrz`", "`log-normal`", "`log-logistic`", "`lomax`", "`skew-normal`", and "`weibull`".
`K`	Number of components.
`initial`	The sequence of initial values including `\omega_1,\dots,\omega_K,\alpha_1,\dots,\alpha_K,\beta_1,\dots,\beta_K`. For skew normal case the vector of initial values of skewness parameters will be added. By default the initial values automatically is determind by k-means method of clustering.
`starts`	If `initial=TRUE`, then sequence of the initial values must be given.

Details

It is worth noting that identifiability of the mixture models supposed to be held. For skew-normal case we have \theta_j=(\alpha_j,\beta_j,\lambda_j)^{T} in which -\infty<\alpha_j<\infty, \beta_j>0, and -\infty<\lambda_j<\infty, respectively, are the location, scale, and skewness parameters of the j-th component, see Azzalini (1985).

Value

The output has three parts, The first part includes vector of estimated weight, shape, and scale parameters.
The second part involves a sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hannan-Quinn information criterion (HQIC), Anderson-Darling (AD), Cram\'eer-von Misses (CVM), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics.
The last part of the output contains clustering vector.

Author(s)

Mahdi Teimouri

References

A. Azzalini, 1985. A class of distributions which includes the normal ones, Scandinavian Journal of Statistics, 12, 171-178.

A. P. Dempster, N. M. Laird, and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.

M. Teimouri, S. Rezakhah, and A. Mohammdpour, 2018. EM algorithm for symmetric stable mixture model, Communications in Statistics-Simulation and Computation, 47(2), 582-604.

Examples

# Here we model the northern hardwood uneven-age forest data (HW$DIA) in inches using a
# 3-component Weibull mixture distribution.
data(HW)
data<-HW$DIA
K<-3
fitmixture(data,"weibull", K, initial=FALSE)

[Package ForestFit version 2.2.3 Index]