mvnpEM {mixtools}R Documentation

EM-like Algorithm for Nonparametric Mixture Models with Conditionally Independent Multivariate Component Densities

Description

An extension of the original npEM algorithm, for mixtures of multivariate data where the coordinates of a row (case) in the data matrix are assumed to be made of independent but multivariate blocks (instead of just coordinates), conditional on the mixture component (subpopulation) from which they are drawn (Chauveau and Hoang 2015).

Usage

mvnpEM(x, mu0, blockid = 1:ncol(x), samebw = TRUE, 
       bwdefault = apply(x,2,bw.nrd0), init = NULL,
       eps = 1e-8, maxiter = 500, verb = TRUE)

Arguments

x

An n\times r matrix of data. Each of the n rows is a case, and each case has r repeated measurements. These measurements are assumed to be conditionally independent, conditional on the mixture component (subpopulation) from which the case is drawn.

mu0

Either an m\times r matrix specifying the initial centers for the kmeans function, or an integer m specifying the number of initial centers, which are then chosen randomly in kmeans

blockid

A vector of length r identifying coordinates (columns of x) that are in the same block. The default has all distinct elements, indicating that the model has r blocks of dimension 1, in which case the model is handled directly by the npEM algorithm. See example below for actual multivariate blocks example.

samebw

Logical: If TRUE, use the same bandwidth per coordinate for all iteration and all components. If FALSE, use a separate bandwidth for each component and coordinate, and update this bandwidth at each iteration of the algorithm using a suitably modified bw.nrd0 method as described in Benaglia et al (2011) and Chauveau and Hoang (2015).

bwdefault

Bandwidth default for density estimation,a simplistic application of the default bw.nrd0 for each coordinate (column) of the data.

init

Initialization method, based on an initial n\times m matrix for the posterior probabilities. If NULL, a kmeans clustering with mu0 initial centers is applied to the data and the initial matrix of posteriors is built from the result.

eps

Tolerance limit for declaring algorithm convergence. Convergence is declared whenever the maximum change in any coordinate of the lambda vector (of mixing proportion estimates) does not exceed eps.

maxiter

The maximum number of iterations allowed; convergence may be declared before maxiter iterations (see eps above).

verb

Verbose mode; if TRUE, print updates for every iteration of the algorithm as it runs

Value

mvnpEM returns a list of class mvnpEM with the following items:

data

The raw data (an n\times r matrix).

posteriors

An n\times m matrix of posterior probabilities for each observation (row).

lambda

The sequence of mixing proportions over iterations.

blockid

The blockid input argument. Needed by any method that produces density estimates from the output, like plot.mvnpEM.

samebw

The samebw input argument. Needed by any method that produces density estimates from the output, like plot.mvnpEM.

bandwidth

The final bandwidth matrix after convergence of the algorithm. Its shape depends on the samebw input argument. If samebw = TRUE, a vectors with the bandwidth value for each of the r coordinates (same for all components and iterations). If samebw = FALSE, a m\times r matrix, where each row is associated to one component and gives the r bandwidth values, one for each coordinate. Needed by any method that produces density estimates from the output, like plot.mvnpEM.

lambdahat

The final mixing proportions.

loglik

The sequence of pseudo log-likelihood values over iterations.

References

See Also

plot.mvnpEM, npEM

Examples

# Example as in Chauveau and Hoang (2015) with 6 coordinates
## Not run: 
m=2; r=6; blockid <-c(1,1,2,2,3,3) # 3 bivariate blocks 
# generate some data x ...
a <- mvnpEM(x, mu0=2, blockid, samebw=F) # adaptive bandwidth
plot(a) # this S3 method produces 6 plots of univariate marginals
summary(a)
## End(Not run)

[Package mixtools version 2.0.0 Index]