fitmixturegrouped {ForestFit}R Documentation

Estimating parameters of the well-known mixture models fitted to the grouped data

Description

Estimates parameters of the gamma, log-normal, and Weibull mixture models fitted to the grouped data using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by

F(x,Θ)=k=1KωkFk(x,θk),F(x,{\Theta}) = \sum_{k=1}^{K}\omega_k F_k(x,\theta_k),

where Θ=(θ1,,θK)T\Theta=(\theta_1,\dots,\theta_K)^T, is the whole parameter vector, θk\theta_k for k=1,,Kk=1,\dots,K is the parameter space of the jj-th component, i.e. θk=(αk,βk)T\theta_k=(\alpha_k,\beta_k)^{T}, Fj(.,θj)F_j(.,\theta_j) is the cdf of the kk-th component, and known constant KK is the number of components. Parameters α\alpha and β\beta are the shape and scale parameters. The constants ωk\omega_ks sum to one, i.e. k=1Kωk=1\sum_{k=1}^{K}\omega_k=1. The families considered for the cdf FF include Gamma, Log-normal, and Weibull. If a sample of nn independent observations each follows a distribution with cdf FF have been divided into mm separate groups of the form (ri1,ri](r_{i-1},r_i], for i=1,,mi=1,\dots,m. So, the likelihood function of the observed data is given by

L(Θf1,,fm)=n!f1!f2!fm!i=1m[Fi(Θ)F(Θ)]fi, L(\Theta|f_1,\dots,f_m)=\frac{n!}{f_{1}!f_{2}!\dots f_{m}!}\prod_{i=1}^{m}\Bigl[\frac{F_i(\Theta)}{F(\Theta)}\Bigr]^{f_i},

where

Fi(Θ)=k=1Kωkri1rif(xθk)dx,F_i(\Theta)=\sum_{k=1}^{K}\omega_k\int_{r_{i-1}}^{r_i}f(x|\theta_k)dx,

F(Θ)=k=1Kωkf(xθk)dx,F(\Theta)=\sum_{k=1}^{K}\omega_kf(x|\theta_k)dx,

in which f(xθk)f(x|\theta_k) denotes the pdf of the jj-th component. Using the the EM algorithm proposed by Dempster et al. (1977), we can solve L(Θf1,,fm)/Θ=0 \partial L(\Theta|f_1,\dots,f_m)/{\partial \Theta}=0 by introducing two new missing variables.

Usage

fitmixturegrouped(family, r, f, K, initial=FALSE, starts)

Arguments

family

Name of the family including: "gamma", "log-normal", "skew-normal", and "weibull".

r

A numeric vector of length m+1m+1. The first element of rr is lower bound of the first group and other mm elements are upper bound of the mm groups. We note that upper bound of the (i1)(i-1)-th group is the lower bound of the ii-th group, for i=2,,mi=2,\dots,m. The lower bound of the first group and upper bound of the mm-th group are chosen arbitrarily. If raw data are available, the smallest and largest observations are chosen for lower bound of the first group and upper bound of the mm-th group, respectively.

f

A numeric vector of length mm containing the group's frequency.

K

Number of components.

initial

The sequence of initial values including ω1,,ωK,α1,,αK,β1,,βK\omega_1,\dots,\omega_K,\alpha_1,\dots,\alpha_K,\beta_1,\dots,\beta_K. For skew normal case the vector of initial values of skewness parameters will be added. By default the initial values automatically is determind by k-means method of clustering.

starts

If initial=TRUE, then sequence of the initial values must be given.

Details

Identifiability of the mixture models supposed to be held. For skew-normal mixture model the parameter vector of kk-th component gets the form θk=(αk,βk,λk)T\theta_k=(\alpha_k,\beta_k,\lambda_k)^{T} where αk,βk,\alpha_k,\beta_k, and λk\lambda_k denote the location, scale, and skewness parameters, respectively.

Value

  1. The output has two parts, The first part includes vector of estimated weight, shape, and scale parameters.

  2. A sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hannan-Quinn information criterion (HQIC), Anderson-Darling (AD), Cram\'eer-von Misses (CVM), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics.

Author(s)

Mahdi Teimouri

References

G. J. McLachlan and P. N. Jones, 1988. Fitting mixture models to grouped and truncated data via the EM algorithm, Biometrics, 44, 571-578

Examples

n<-50
K<-2
m<-10
weight<-c(0.3,0.7)
alpha<-c(1,2)
beta<-c(2,1)
param<-c(weight,alpha,beta)
data<-rmixture(n, "weibull", K, param)
r<-seq(min(data),max(data),length=m+1)
D<-data.frame(table(cut(data,r,labels=NULL,include.lowest=TRUE,right=FALSE,dig.lab=4)))
f<-D$Freq
fitmixturegrouped("weibull",r,f,K,initial=FALSE)

[Package ForestFit version 2.2.3 Index]