pmclust-package {pmclust} | R Documentation |
Parallel Model-Based Clustering
Description
The pmclust aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The package employs pbdMPI to perform a parallel version of expectation and maximization (EM) algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data (SPMD) programming model. The code can be executed through pbdMPI and independent to most MPI applications. See the High Performance Statistical Computing (HPSC) website for more information, documents and examples.
Details
Package: | pmclust |
Type: | Package |
License: | GPL |
LazyLoad: | yes |
The main function is pmclust
implementing the
parallel EM algorithm for mixture multivariate Gaussian models
with unstructured dispersions. This function groups a data matrix
X.gbd
or X.spmd
into K
clusters where X.gbd
or X.spmd
is
potentially huge and taken from the global environment .GlobalEnv
or .pmclustEnv
.
Other main functions em.step
, aecm.step
,
apecm.step
, and
apecma.step
may provide better performance than the
em.step
in terms of computing time and convergent
iterations.
kmeans.step
provides the fastest clustering among above
algorithms, but it is restricted by Euclidean distance and spherical
dispersions.
Author(s)
Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov
References
Programming with Big Data in R Website: https://pbdr.org/
Chen, W.-C. and Maitra, R. (2011) “Model-based clustering of regression time series data via APECM – an AECM algorithm sung to an even faster beat”, Statistical Analysis and Data Mining, 4, 567-578.
Chen, W.-C., Ostrouchov, G., Pugmire, D., Prabhat, M., and Wehner, M. (2013) “A Parallel EM Algorithm for Model-Based Clustering with Application to Explore Large Spatio-Temporal Data”, Technometrics, (revision).
Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) “Maximum Likelihood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society Series B, 39, 1-38.
Lloyd., S. P. (1982) “Least squares quantization in PCM”, IEEE Transactions on Information Theory, 28, 129-137.
Meng, X.-L. and Van Dyk, D. (1997) “The EM Algorithm – an Old Folk-song Sung to a Fast New Tune”, Journal of the Royal Statistical Society Series B, 59, 511-567.
See Also
em.step
,
aecm.step
,
apecm.step
,
apecma.step
,
kmeans.step
.
Examples
## Not run:
### Under command mode, run the demo with 2 processors by
### (Use Rscript.exe for windows system)
mpiexec -np 2 Rscript -e 'demo(gbd_em,"pmclust",ask=F,echo=F)'
mpiexec -np 2 Rscript -e 'demo(gbd_aecm,"pmclust",ask=F,echo=F)'
mpiexec -np 2 Rscript -e 'demo(gbd_apecm,"pmclust",ask=F,echo=F)'
mpiexec -np 2 Rscript -e 'demo(gbd_apecma,"pmclust",ask=F,echo=F)'
mpiexec -np 2 Rscript -e 'demo(gbd_kmeans,"pmclust",ask=F,echo=F)'
mpiexec -np 2 Rscript -e 'demo(ex_em,"pmclust",ask=F,echo=F)'
mpiexec -np 2 Rscript -e 'demo(ex_aecm,"pmclust",ask=F,echo=F)'
mpiexec -np 2 Rscript -e 'demo(ex_apecm,"pmclust",ask=F,echo=F)'
mpiexec -np 2 Rscript -e 'demo(ex_apecma,"pmclust",ask=F,echo=F)'
mpiexec -np 2 Rscript -e 'demo(ex_kmeans,"pmclust",ask=F,echo=F)'
## End(Not run)