mixfit {mixR}R Documentation

Finite Mixture Modeling for Raw Data and Binned Data

Description

This function is used to perform the maximum likelihood estimation for a variety of finite mixture models for both raw and binned data by using the EM algorithm, together with Newton-Raphson algorithm or bisection method when necessary.

Usage

mixfit(
  x,
  ncomp = NULL,
  family = c("normal", "weibull", "gamma", "lnorm"),
  pi = NULL,
  mu = NULL,
  sd = NULL,
  ev = FALSE,
  mstep.method = c("bisection", "newton"),
  init.method = c("kmeans", "hclust"),
  tol = 1e-06,
  max_iter = 500
)

Arguments

x

a numeric vector for the raw data or a three-column matrix for the binned data

ncomp

a positive integer specifying the number of components of the mixture model

family

a character string specifying the family of the mixture model. It can only be one element from normal, weibull, gamma or lnorm.

pi

a vector of the initial value for the proportion

mu

a vector of the initial value for the mean

sd

a vector of the initial value for the standard deviation

ev

a logical value controlling whether each component has the same variance when fitting normal mixture models. It is ignored when fitting other mixture models. The default is FALSE.

mstep.method

a character string specifying the method used in M-step of the EM algorithm when fitting weibull or gamma mixture models. It can be either bisection or newton. The default is bisection.

init.method

a character string specifying the method used for providing initial values for the parameters for EM algorithm. It can be one of kmeans or hclust. The default is kmeans

tol

the tolerance for the stopping rule of EM algorithm. It is the value to stop EM algorithm when the two consecutive iterations produces loglikelihood with difference less than tol. The default value is 1e-6.

max_iter

the maximum number of iterations for the EM algorithm (default 500).

Details

The function mixfit is the core function in this package. It is used to perform the maximum likelihood estimation for finite mixture models from the families of normal, weibull, gamma or lognormal by using the EM algorithm. When the family is weibull or gamma, the M-step of the EM algorithm has no closed-form solution and we can use Newton algorithm by specifying method = "newton" or use bisection method by specifying method = "bisection".

The initial values of the EM algorithm can be provided by specifying the proportion of each component pi, the mean of each component mu and the standard deviation of each component sd. If one or more of these initial values are not provided, then their values are estimated by using K-means clustering method or hierarchical clustering method. If all of pi, mu, and sd are not provided, then ncomp should be provided so initial values are automatically generated. For the normal mixture models, we can control whether each component has the same variance or not.

Value

the function mixfit return an object of class mixfitEM, which contains a list of different number of items when fitting different mixture models. The common items include

pi

a numeric vector representing the estimated proportion of each component

mu

a numeric vector representing the estimated mean of each component

sd

a numeric vector representing the estimated standard deviation of each component

iter

a positive integer recording the number of EM iteration performed

loglik

the loglikelihood of the estimated mixture model for the data x

aic

the value of AIC of the estimated model for the data x

bic

the value of BIC of the estimated model for the data x

data

the data x

comp.prob

the probability that x belongs to each component

family

the family the mixture model belongs to

For the Weibull mixture model, the following extra items are returned.

k

a numeric vector representing the estimated shape parameter of each component

lambda

a numeric vector representing the estimated scale parameter of each component

For the Gamma mixture model, the following extra items are returned.

alpha

a numeric vector representing the estimated shape parameter of each component

lambda

a numeric vector representing the estimated rate parameter of each component

For the lognormal mixture model, the following extra items are returned.

mulog

a numeric vector representing the estimated logarithm mean of each component

sdlog

a numeric vector representing the estimated logarithm standard deviation of each component

See Also

plot.mixfitEM, density.mixfitEM, select, bs.test

Examples

## fitting the normal mixture models
set.seed(103)
x <- rmixnormal(200, c(0.3, 0.7), c(2, 5), c(1, 1))
data <- bin(x, seq(-1, 8, 0.25))
fit1 <- mixfit(x, ncomp = 2)  # raw data
fit2 <- mixfit(data, ncomp = 2)  # binned data
fit3 <- mixfit(x, pi = c(0.5, 0.5), mu = c(1, 4), sd = c(1, 1))  # providing the initial values
fit4 <- mixfit(x, ncomp = 2, ev = TRUE)  # setting the same variance

## (not run) fitting the weibull mixture models
## x <- rmixweibull(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit5 <- mixfit(x, ncomp = 2, family = "weibull")  # raw data
## fit6 <- mixfit(data, ncomp = 2, family = "weibull")  # binned data

## (not run) fitting the Gamma mixture models
## x <- rmixgamma(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit7 <- mixfit(x, ncomp = 2, family = "gamma")  # raw data
## fit8 <- mixfit(data, ncomp = 2, family = "gamma")  # binned data

## (not run) fitting the lognormal mixture models
## x <- rmixlnorm(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit9 <- mixfit(x, ncomp = 2, family = "lnorm")  # raw data
## fit10 <- mixfit(data, ncomp = 2, family = "lnorm")  # binned data


[Package mixR version 0.2.0 Index]