fitmixturegrouped {ForestFit}R Documentation

Estimating parameters of the well-known mixture models fitted to the grouped data

Description

Estimates parameters of the gamma, log-normal, and Weibull mixture models fitted to the grouped data using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by

F(x,{\Theta}) = \sum_{k=1}^{K}\omega_k F_k(x,\theta_k),

where \Theta=(\theta_1,\dots,\theta_K)^T, is the whole parameter vector, \theta_k for k=1,\dots,K is the parameter space of the j-th component, i.e. \theta_k=(\alpha_k,\beta_k)^{T}, F_j(.,\theta_j) is the cdf of the k-th component, and known constant K is the number of components. Parameters \alpha and \beta are the shape and scale parameters. The constants \omega_ks sum to one, i.e. \sum_{k=1}^{K}\omega_k=1. The families considered for the cdf F include Gamma, Log-normal, and Weibull. If a sample of n independent observations each follows a distribution with cdf F have been divided into m separate groups of the form (r_{i-1},r_i], for i=1,\dots,m. So, the likelihood function of the observed data is given by

L(\Theta|f_1,\dots,f_m)=\frac{n!}{f_{1}!f_{2}!\dots f_{m}!}\prod_{i=1}^{m}\Bigl[\frac{F_i(\Theta)}{F(\Theta)}\Bigr]^{f_i},

where

F_i(\Theta)=\sum_{k=1}^{K}\omega_k\int_{r_{i-1}}^{r_i}f(x|\theta_k)dx,

F(\Theta)=\sum_{k=1}^{K}\omega_kf(x|\theta_k)dx,

in which f(x|\theta_k) denotes the pdf of the j-th component. Using the the EM algorithm proposed by Dempster et al. (1977), we can solve \partial L(\Theta|f_1,\dots,f_m)/{\partial \Theta}=0 by introducing two new missing variables.

Usage

fitmixturegrouped(family, r, f, K, initial=FALSE, starts)

Arguments

family

Name of the family including: "gamma", "log-normal", "skew-normal", and "weibull".

r

A numeric vector of length m+1. The first element of r is lower bound of the first group and other m elements are upper bound of the m groups. We note that upper bound of the (i-1)-th group is the lower bound of the i-th group, for i=2,\dots,m. The lower bound of the first group and upper bound of the m-th group are chosen arbitrarily. If raw data are available, the smallest and largest observations are chosen for lower bound of the first group and upper bound of the m-th group, respectively.

f

A numeric vector of length m containing the group's frequency.

K

Number of components.

initial

The sequence of initial values including \omega_1,\dots,\omega_K,\alpha_1,\dots,\alpha_K,\beta_1,\dots,\beta_K. For skew normal case the vector of initial values of skewness parameters will be added. By default the initial values automatically is determind by k-means method of clustering.

starts

If initial=TRUE, then sequence of the initial values must be given.

Details

Identifiability of the mixture models supposed to be held. For skew-normal mixture model the parameter vector of k-th component gets the form \theta_k=(\alpha_k,\beta_k,\lambda_k)^{T} where \alpha_k,\beta_k, and \lambda_k denote the location, scale, and skewness parameters, respectively.

Value

  1. The output has two parts, The first part includes vector of estimated weight, shape, and scale parameters.

  2. A sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hannan-Quinn information criterion (HQIC), Anderson-Darling (AD), Cram\'eer-von Misses (CVM), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics.

Author(s)

Mahdi Teimouri

References

G. J. McLachlan and P. N. Jones, 1988. Fitting mixture models to grouped and truncated data via the EM algorithm, Biometrics, 44, 571-578

Examples

n<-50
K<-2
m<-10
weight<-c(0.3,0.7)
alpha<-c(1,2)
beta<-c(2,1)
param<-c(weight,alpha,beta)
data<-rmixture(n, "weibull", K, param)
r<-seq(min(data),max(data),length=m+1)
D<-data.frame(table(cut(data,r,labels=NULL,include.lowest=TRUE,right=FALSE,dig.lab=4)))
f<-D$Freq
fitmixturegrouped("weibull",r,f,K,initial=FALSE)

[Package ForestFit version 2.2.3 Index]