fitmixturegrouped {ForestFit} | R Documentation |
Estimating parameters of the well-known mixture models fitted to the grouped data
Description
Estimates parameters of the gamma, log-normal, and Weibull mixture models fitted to the grouped data using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by
F(x,{\Theta}) = \sum_{k=1}^{K}\omega_k F_k(x,\theta_k),
where \Theta=(\theta_1,\dots,\theta_K)^T
, is the whole parameter vector, \theta_k
for k=1,\dots,K
is the parameter space of the j
-th component, i.e. \theta_k=(\alpha_k,\beta_k)^{T}
, F_j(.,\theta_j)
is the cdf of the k
-th component, and known constant K
is the number of components. Parameters \alpha
and \beta
are the shape and scale parameters. The constants \omega_k
s sum to one, i.e. \sum_{k=1}^{K}\omega_k=1
. The families considered for the cdf F
include Gamma, Log-normal, and Weibull. If a sample of n
independent observations each follows a distribution with cdf F
have been divided into m
separate groups of the form (r_{i-1},r_i]
, for i=1,\dots,m
. So, the likelihood function of the observed data is given by
L(\Theta|f_1,\dots,f_m)=\frac{n!}{f_{1}!f_{2}!\dots f_{m}!}\prod_{i=1}^{m}\Bigl[\frac{F_i(\Theta)}{F(\Theta)}\Bigr]^{f_i},
where
F_i(\Theta)=\sum_{k=1}^{K}\omega_k\int_{r_{i-1}}^{r_i}f(x|\theta_k)dx,
F(\Theta)=\sum_{k=1}^{K}\omega_kf(x|\theta_k)dx,
in which f(x|\theta_k)
denotes the pdf of the j
-th component. Using the the EM algorithm proposed by Dempster et al. (1977), we can solve
\partial L(\Theta|f_1,\dots,f_m)/{\partial \Theta}=0
by introducing two new missing variables.
Usage
fitmixturegrouped(family, r, f, K, initial=FALSE, starts)
Arguments
family |
Name of the family including: " |
r |
A numeric vector of length |
f |
A numeric vector of length |
K |
Number of components. |
initial |
The sequence of initial values including |
starts |
If |
Details
Identifiability of the mixture models supposed to be held. For skew-normal mixture model the parameter vector of k
-th component gets the form \theta_k=(\alpha_k,\beta_k,\lambda_k)^{T}
where \alpha_k,\beta_k,
and \lambda_k
denote the location, scale, and skewness parameters, respectively.
Value
The output has two parts, The first part includes vector of estimated weight, shape, and scale parameters.
A sequence of goodness-of-fit measures consist of Akaike Information Criterion (
AIC
), Consistent Akaike Information Criterion (CAIC
), Bayesian Information Criterion (BIC
), Hannan-Quinn information criterion (HQIC
), Anderson-Darling (AD
), Cram\'eer-von Misses (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
Author(s)
Mahdi Teimouri
References
G. J. McLachlan and P. N. Jones, 1988. Fitting mixture models to grouped and truncated data via the EM algorithm, Biometrics, 44, 571-578
Examples
n<-50
K<-2
m<-10
weight<-c(0.3,0.7)
alpha<-c(1,2)
beta<-c(2,1)
param<-c(weight,alpha,beta)
data<-rmixture(n, "weibull", K, param)
r<-seq(min(data),max(data),length=m+1)
D<-data.frame(table(cut(data,r,labels=NULL,include.lowest=TRUE,right=FALSE,dig.lab=4)))
f<-D$Freq
fitmixturegrouped("weibull",r,f,K,initial=FALSE)