R: EM and MLE Estimation of Univariate Normal Mixtures

norMixFit {nor1mix}

R Documentation

EM and MLE Estimation of Univariate Normal Mixtures

Description

These functions estimate the parameters of a univariate (finite) normal mixture using the EM algorithm or Likelihood Maximimization via optim(.., method = "BFGS").

Usage

norMixEM(x, m, name = NULL, sd.min = 1e-07* diff(range(x))/m,
         trafo = c("clr1", "logit"),
         maxiter = 100, tol = sqrt(.Machine$double.eps), trace = 1)

norMixMLE(x, m, name = NULL, 
          trafo = c("clr1", "logit"),
          maxiter = 100, tol = sqrt(.Machine$double.eps), trace = 2)

Arguments

`x`	numeric: the data for which the parameters are to be estimated.
`m`	integer or factor: If `m` has length 1 it specifies the number of mixture components, otherwise it is taken to be a vector of initial cluster assignments, see details below.
`name`	character, passed to `norMix`. The default, `NULL`, uses `match.call()`.
`sd.min`	number: the minimal value that the normal components' standard deviations (`sd`) are allowed to take. A warning is printed if some of the final `sd`'s are this boundary.
`trafo`	`character` string specifying the transformation of the component weight `w` `m`-vector (mathematical notation in `norMix`: `\pi_j, j=1,\dots,m`) to an `(m-1)`-dimensional unconstrained parameter vector in our parametrization. See `nM2par` for details.
`maxiter`	integer: maximum number of EM iterations.
`tol`	numeric: EM iterations stop if relative changes of the log-likelihood are smaller than `tol`.
`trace`	integer (or logical) specifying if the iterations should be traced and how much output should be produced. The default, `1` prints a final one line summary, where `trace = 2` produces one line of output per iteration.

Details

Estimation of univariate mixtures can be very sensitive to initialization. By default, norMixEM and norMixLME cut the data into m groups of approximately equal size. See examples below for other initialization possibilities.

The EM algorithm consists in repeated application of E- and M- steps until convergence. Mainly for didactical reasons, we also provide the functions estep.nm, mstep.nm, and emstep.nm.

The MLE, Maximum Likelihood Estimator, maximizes the likelihood using optim, using the same advantageous parametrization as llnorMix.

Value

An object of class norMix.

Author(s)

EM: Friedrich Leisch, originally; Martin Maechler vectorized it in m, added trace etc.

MLE: M.Maechler

Examples

## use (mu, sigma)
ex  <- norMix(mu = c(-1,2,5), sigma = c(1, 1/sqrt(2), sqrt(3)))
tools::assertWarning(verbose=TRUE,
           ## *deprecated* (using 'sig2' will *NOT* work in future!)
           ex. <- norMix(mu = c(-1,2,5), sig2 = c(1, 0.5, 3))
       )
stopifnot(all.equal(ex, ex.))
plot(ex, col="gray", p.norm=FALSE)

x <- rnorMix(100, ex)
lines(density(x))
rug(x)

## EM estimation may fail depending on random sample
ex1 <- norMixEM(x, 3, trace=2) #-> warning (sometimes)
ex1
plot(ex1)

## initialization by cut() into intervals of equal length:
ex2 <- norMixEM(x, cut(x, 3))
ex2

## initialization by kmeans():
k3 <- kmeans(x, 3)$cluster
ex3 <- norMixEM(x, k3)
ex3

## Now, MLE instead of EM:
exM <- norMixMLE(x, k3, tol = 1e-12, trace=4)
exM

## real data
data(faithful)
plot(density(faithful$waiting, bw = "SJ"), ylim=c(0,0.044))
rug(faithful$waiting)

(nmF <- norMixEM(faithful$waiting, 2))
lines(nmF, col=2)
## are three components better?
nmF3 <- norMixEM(faithful$waiting, 3, maxiter = 200)
lines(nmF3, col="forestgreen")

[Package nor1mix version 1.3-3 Index]