CNmixt {ContaminatedMixt}  R Documentation 
Fits, by using the expectation conditionalmaximization (ECM) algorithm, parsimonious mixtures of multivariate contaminated normal distributions (with eigendecomposed scale matrices) to the given data within a clustering paradigm (default) or classification paradigm. Can be run in parallel. Likelihoodbased model selection criteria are used to select the parsimonious model and the number of groups.
CNmixt(X, G, contamination = NULL, model = NULL, initialization = "mixt", alphafix = NULL, alphamin = 0.5, seed = NULL, start.z = NULL, start.v = NULL, start = 0, label = NULL, AICcond = FALSE, iter.max = 1000, threshold = 1.0e10, parallel = FALSE, eps = 1e100,verbose = TRUE) CNmixtCV(X, G, contamination = NULL, model = NULL, initialization = "mixt", k = 10,alphafix = NULL, alphamin = 0.5, seed = NULL, start.z = NULL, start.v = NULL, start = 0, label = NULL, iter.max = 1000, threshold = 1.0e10, parallel = FALSE, eps = 1e100, verbose = TRUE)
X 
a 
G 
a vector containing the numbers of groups to be tried. 
contamination 
an optional boolean indicating if the model(s) to be fitted have to be contaminated or not.
If 
model 
a vector indicating the model(s) to be fitted.
In the multivariate case (p>1), possible values are: 
initialization 
initialization strategy for the ECM algorithm. It can be:

alphafix 
a vector of length G with the proportion of good observations in each group.
If 
alphamin 
a vector of length G with the minimum proportion of good observations in each group.
If 
seed 
the seed for the random number generator, when random initializations are used; if 
start.z 
initial n \times G matrix of either soft or hard classification.
Default value is 
start.v 
initial n \times G matrix of posterior probabilities to be a good observation in each group. Default value is a n \times G matrix of ones. 
start 
when 
label 
a vector of integers of length equal to the number of rows of 
AICcond 
When 
iter.max 
maximum number of iterations in the ECM algorithm.
Default value is 
threshold 
threshold for Aitken's acceleration procedure.
Default value is 
parallel 
When 
eps 
an optional scalar.
It sets the smallest value for the eigenvalues of the component scale matrices.
Default value is 
k 
number of equal sized subsamples used in kfold crossvalidation. 
verbose 
write text to the console 
The multivariate data contained in X
are either clustered or classified using parsimonious mixtures of multivariate contaminated normal distributions with some or all of the 14 parsimonious models described in Punzo and McNicholas (2016).
Model specification (via the model
argument) follows the nomenclature popularized in other packages such as mixture and mclust.
Such a nomenclature refers to the decomposition and constraints on the scale matrix (see Banfield and Raftery, 1993, Celeux and Govaert, 1995 and Punzo and McNicholas, 2016 for details):
Σ_g = λ_g Γ_g Δ_g Γ_g'.
The nomenclature describes (in order) the volume (λ_g), shape (Δ_g), and orientation (Γ_g), in terms of "V"
ariable, "E"
qual, or the "I"
dentity matrix.
As an example, the string "VEI"
would refer to the model where Σ_g = λ_g Δ.
Note that for G=1, several models are equivalent (for example, "EEE"
and "VVV"
).
Thus, for G=1 only one model from each set of equivalent models will be run.
The algorithms detailed in Celeux and Govaert (1995) are considered in the first CMstep of the ECM algorithm to update Σ_g for all the models apart from "EVE"
and "VVE"
.
For "EVE"
and "VVE"
, majorizationminimization (MM) algorithms (Hunter and Lange, 2000) and accelerated line search algorithms on the Stiefel manifold (Absil, Mahony and Sepulchre, 2009 and Browne and McNicholas, 2014), which are especially preferable in higher dimensions (Browne and McNicholas, 2014), are used to update Σ_g; the same approach is also adopted in the mixture package for those models.
Starting values are very important to the successful operation of these algorithms and so care must be taken in the interpretation of results.
All the initializations considered here provide initial quantities for the first CMstep of the ECM algorithm.
The predictive ability of a model for classification may be estimated using the crossvalidated error rate, returned by CNmixtCV
or trough the the AICcond criterion (Vandewalle et al., 2013).
CNmixt
returns an object of class ContaminatedMixt
.
CNmixtCV
returns a list with the crossvalidated error rate estimated for each model.
Antonio Punzo, Angelo Mazza, Paul D. McNicholas
Absil P. A., Mahony R. and Sepulchre R. (2009). Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ.
Banfield J. D. and Raftery A. E. (1993). ModelBased Gaussian and NonGaussian Clustering. Biometrics, 49(3), 803–821.
Browne R. P. and McNicholas P. D. (2013). Estimating Common Principal Components in High Dimensions. Advances in Data Analysis and Classification, 8(2), 217–226.
Browne, R. P. and McNicholas P. D. (2014). Orthogonal Stiefel manifold optimization for eigendecomposed covariance parameter estimation in mixture models. Statistics and Computing, 24(2), 203–210.
Browne R. P. and McNicholas P. D. (2015). mixture: Mixture Models for Clustering and Classification. R package version 1.4.
Celeux G. and Govaert G. (1995). Gaussian Parsimonious Clustering Models. Pattern Recognition. 28(5), 781–793.
Hunter D. R. and Lange K. (2000). Rejoinder to Discussion of “Optimization Transfer Using Surrogate Objective Functions”. Journal of Computational and Graphical Statistics, 9(1), 52–59.
Punzo A., Mazza A. and McNicholas P. D. (2018). ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions. Journal of Statistical Software, 85(10), 1–25.
Punzo A. and McNicholas P. D. (2016). Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal, 58(6), 1506–1537.
Vandewalle V., Biernacki C., Celeux G. and Govaert G. (2013). A predictive deviance criterion for selecting a generative model in semisupervised classification. Computational Statistics and Data Analysis, 64, 220–236.
## Note that the example is extremely simplified ## in order to reduce computation time # Artificial data from an EEI Gaussian mixture with G = 2 components library("mnormt") p < 2 set.seed(12345) X1 < rmnorm(n = 200, mean = rep(2, p), varcov = diag(c(5, 0.5))) X2 < rmnorm(n = 200, mean = rep(2, p), varcov = diag(c(5, 0.5))) noise < matrix(runif(n = 40, min = 20, max = 20), nrow = 20, ncol = 2) X < rbind(X1, X2, noise) group < rep(c(1, 2, 3), times = c(200, 200, 20)) plot(X, col = group, pch = c(3, 4, 16)[group], asp = 1, xlab = expression(X[1]), ylab = expression(X[2])) #  # # Modelbased clustering # #  # res1 < CNmixt(X, model = c("EEI", "VVV"), G = 2, parallel = FALSE) summary(res1) agree(res1, givgroup = group) plot(res1, contours = TRUE, asp = 1, xlab = expression(X[1]), ylab = expression(X[2])) #  # # Modelbased classification # #  # indlab < sample(1:400, 20) lab < rep(0,nrow(X)) lab[indlab] < group[indlab] res2 < CNmixt(X, G = 2, model = "EEI", label = lab) agree(res2, givgroup = group)