R: Finite Mixture Modeling for Raw Data and Binned Data

mixfit {mixR}

R Documentation

Finite Mixture Modeling for Raw Data and Binned Data

Description

This function is used to perform the maximum likelihood estimation for a variety of finite mixture models for both raw and binned data by using the EM algorithm, together with Newton-Raphson algorithm or bisection method when necessary.

Usage

mixfit(
  x,
  ncomp = NULL,
  family = c("normal", "weibull", "gamma", "lnorm"),
  pi = NULL,
  mu = NULL,
  sd = NULL,
  ev = FALSE,
  mstep.method = c("bisection", "newton"),
  init.method = c("kmeans", "hclust"),
  tol = 1e-06,
  max_iter = 500
)

Arguments

`x`	a numeric vector for the raw data or a three-column matrix for the binned data
`ncomp`	a positive integer specifying the number of components of the mixture model
`family`	a character string specifying the family of the mixture model. It can only be one element from `normal`, `weibull`, `gamma` or `lnorm`.
`pi`	a vector of the initial value for the proportion
`mu`	a vector of the initial value for the mean
`sd`	a vector of the initial value for the standard deviation
`ev`	a logical value controlling whether each component has the same variance when fitting normal mixture models. It is ignored when fitting other mixture models. The default is `FALSE`.
`mstep.method`	a character string specifying the method used in M-step of the EM algorithm when fitting weibull or gamma mixture models. It can be either `bisection` or `newton`. The default is `bisection`.
`init.method`	a character string specifying the method used for providing initial values for the parameters for EM algorithm. It can be one of `kmeans` or `hclust`. The default is `kmeans`
`tol`	the tolerance for the stopping rule of EM algorithm. It is the value to stop EM algorithm when the two consecutive iterations produces loglikelihood with difference less than `tol`. The default value is 1e-6.
`max_iter`	the maximum number of iterations for the EM algorithm (default 500).

Details

The function mixfit is the core function in this package. It is used to perform the maximum likelihood estimation for finite mixture models from the families of normal, weibull, gamma or lognormal by using the EM algorithm. When the family is weibull or gamma, the M-step of the EM algorithm has no closed-form solution and we can use Newton algorithm by specifying method = "newton" or use bisection method by specifying method = "bisection".

The initial values of the EM algorithm can be provided by specifying the proportion of each component pi, the mean of each component mu and the standard deviation of each component sd. If one or more of these initial values are not provided, then their values are estimated by using K-means clustering method or hierarchical clustering method. If all of pi, mu, and sd are not provided, then ncomp should be provided so initial values are automatically generated. For the normal mixture models, we can control whether each component has the same variance or not.

Value

the function mixfit return an object of class mixfitEM, which contains a list of different number of items when fitting different mixture models. The common items include

`pi`	a numeric vector representing the estimated proportion of each component
`mu`	a numeric vector representing the estimated mean of each component
`sd`	a numeric vector representing the estimated standard deviation of each component
`iter`	a positive integer recording the number of EM iteration performed
`loglik`	the loglikelihood of the estimated mixture model for the data `x`
`aic`	the value of AIC of the estimated model for the data `x`
`bic`	the value of BIC of the estimated model for the data `x`
`data`	the data `x`
`comp.prob`	the probability that `x` belongs to each component
`family`	the family the mixture model belongs to

For the Weibull mixture model, the following extra items are returned.

`k`	a numeric vector representing the estimated shape parameter of each component
`lambda`	a numeric vector representing the estimated scale parameter of each component

For the Gamma mixture model, the following extra items are returned.

`alpha`	a numeric vector representing the estimated shape parameter of each component
`lambda`	a numeric vector representing the estimated rate parameter of each component

For the lognormal mixture model, the following extra items are returned.

`mulog`	a numeric vector representing the estimated logarithm mean of each component
`sdlog`	a numeric vector representing the estimated logarithm standard deviation of each component

Examples

## fitting the normal mixture models
set.seed(103)
x <- rmixnormal(200, c(0.3, 0.7), c(2, 5), c(1, 1))
data <- bin(x, seq(-1, 8, 0.25))
fit1 <- mixfit(x, ncomp = 2)  # raw data
fit2 <- mixfit(data, ncomp = 2)  # binned data
fit3 <- mixfit(x, pi = c(0.5, 0.5), mu = c(1, 4), sd = c(1, 1))  # providing the initial values
fit4 <- mixfit(x, ncomp = 2, ev = TRUE)  # setting the same variance

## (not run) fitting the weibull mixture models
## x <- rmixweibull(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit5 <- mixfit(x, ncomp = 2, family = "weibull")  # raw data
## fit6 <- mixfit(data, ncomp = 2, family = "weibull")  # binned data

## (not run) fitting the Gamma mixture models
## x <- rmixgamma(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit7 <- mixfit(x, ncomp = 2, family = "gamma")  # raw data
## fit8 <- mixfit(data, ncomp = 2, family = "gamma")  # binned data

## (not run) fitting the lognormal mixture models
## x <- rmixlnorm(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit9 <- mixfit(x, ncomp = 2, family = "lnorm")  # raw data
## fit10 <- mixfit(data, ncomp = 2, family = "lnorm")  # binned data

[Package mixR version 0.2.0 Index]