fitmixture {ForestFit}R Documentation

Estimating parameters of the well-known mixture models


Estimates parameters of the mixture model using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by

F(x,Θ)=j=1KωjFj(x,θj),F(x,{\Theta}) = \sum_{j=1}^{K}\omega_j F_j(x,\theta_j),

where Θ=(θ1,,θK)T\Theta=(\theta_1,\dots,\theta_K)^T, is the whole parameter vector, θj\theta_j for j=1,,Kj=1,\dots,K is the parameter space of the jj-th component, i.e. θj=(αj,βj)T\theta_j=(\alpha_j,\beta_j)^{T}, Fj(.,θj)F_j(.,\theta_j) is the cdf of the jj-th component, and known constant KK is the number of components. Parameters α\alpha and β\beta are the shape and scale parameters or both are the shape parameters. In the latter case, the parameters α\alpha and β\beta are called the first and second shape parameters, respectively. We note that the constants ωj\omega_js sum to one, i.e. j=1Kωj=1\sum_{j=1}^{K}\omega_j=1. The families considered for the cdf FF include Birnbaum-Saunders, Burr type XII, Chen, F, Frechet, Gamma, Gompertz, Log-normal, Log-logistic, Lomax, skew-normal, and Weibull.


fitmixture(data, family, K, initial=FALSE, starts)



Vector of observations.


Name of the family including: "birnbaum-saunders", "burrxii", "chen", "f", "Frechet", "gamma", "gompetrz", "log-normal", "log-logistic", "lomax", "skew-normal", and "weibull".


Number of components.


The sequence of initial values including ω1,,ωK,α1,,αK,β1,,βK\omega_1,\dots,\omega_K,\alpha_1,\dots,\alpha_K,\beta_1,\dots,\beta_K. For skew normal case the vector of initial values of skewness parameters will be added. By default the initial values automatically is determind by k-means method of clustering.


If initial=TRUE, then sequence of the initial values must be given.


It is worth noting that identifiability of the mixture models supposed to be held. For skew-normal case we have θj=(αj,βj,λj)T\theta_j=(\alpha_j,\beta_j,\lambda_j)^{T} in which <αj<-\infty<\alpha_j<\infty, βj>0\beta_j>0, and <λj<-\infty<\lambda_j<\infty, respectively, are the location, scale, and skewness parameters of the jj-th component, see Azzalini (1985).


  1. The output has three parts, The first part includes vector of estimated weight, shape, and scale parameters.

  2. The second part involves a sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hannan-Quinn information criterion (HQIC), Anderson-Darling (AD), Cram\'eer-von Misses (CVM), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics.

  3. The last part of the output contains clustering vector.


Mahdi Teimouri


A. Azzalini, 1985. A class of distributions which includes the normal ones, Scandinavian Journal of Statistics, 12, 171-178.

A. P. Dempster, N. M. Laird, and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.

M. Teimouri, S. Rezakhah, and A. Mohammdpour, 2018. EM algorithm for symmetric stable mixture model, Communications in Statistics-Simulation and Computation, 47(2), 582-604.


# Here we model the northern hardwood uneven-age forest data (HW$DIA) in inches using a
# 3-component Weibull mixture distribution.
fitmixture(data,"weibull", K, initial=FALSE)

[Package ForestFit version 2.2.3 Index]