Mclust {mclust} | R Documentation |
Model-Based Clustering
Description
Model-based clustering based on parameterized finite Gaussian mixture models. Models are estimated by EM algorithm initialized by hierarchical model-based agglomerative clustering. The optimal model is then selected according to BIC.
Usage
Mclust(data, G = NULL, modelNames = NULL,
prior = NULL,
control = emControl(),
initialization = NULL,
warn = mclust.options("warn"),
x = NULL,
verbose = interactive(), ...)
Arguments
data |
A numeric vector, matrix, or data frame of observations. Categorical
variables are not allowed. If a matrix or data frame, rows
correspond to observations ( |
G |
An integer vector specifying the numbers of mixture components
(clusters) for which the BIC is to be calculated.
The default is |
modelNames |
A vector of character strings indicating the models to be fitted in the EM phase of clustering. The default is:
The help file for |
prior |
The default assumes no prior, but this argument allows specification of a
conjugate prior on the means and variances through the function
|
control |
A list of control parameters for EM. The defaults are set by the call
|
initialization |
A list containing zero or more of the following components:
|
warn |
A logical value indicating whether or not certain warnings
(usually related to singularity) should be issued.
The default is controlled by |
x |
An object of class |
verbose |
A logical controlling if a text progress bar is displayed during the
fitting procedure. By default is |
... |
Catches unused arguments in indirect or list calls via |
Value
An object of class 'Mclust'
providing the optimal (according to BIC)
mixture model estimation.
The details of the output components are as follows:
call |
The matched call |
data |
The input data matrix. |
modelName |
A character string denoting the model at which the optimal BIC occurs. |
n |
The number of observations in the data. |
d |
The dimension of the data. |
G |
The optimal number of mixture components. |
BIC |
All BIC values. |
loglik |
The log-likelihood corresponding to the optimal BIC. |
df |
The number of estimated parameters. |
bic |
BIC value of the selected model. |
icl |
ICL value of the selected model. |
hypvol |
The hypervolume parameter for the noise component if required, otherwise set to |
parameters |
A list with the following components:
|
z |
A matrix whose [i,k]th entry is the probability that observation i in the test data belongs to the kth class. |
classification |
The classification corresponding to |
uncertainty |
The uncertainty associated with the classification. |
References
Scrucca L., Fraley C., Murphy T. B. and Raftery A. E. (2023) Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman & Hall/CRC, ISBN: 978-1032234953, https://mclust-org.github.io/book/
Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, The R Journal, 8/1, pp. 289-317.
Fraley C. and Raftery A. E. (2002) Model-based clustering, discriminant analysis and density estimation, Journal of the American Statistical Association, 97/458, pp. 611-631.
C. Fraley and A. E. Raftery (2007) Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, 24, 155-181.
See Also
summary.Mclust
,
plot.Mclust
,
priorControl
,
emControl
,
hc
,
mclustBIC
,
mclustModelNames
,
mclust.options
Examples
mod1 <- Mclust(iris[,1:4])
summary(mod1)
mod2 <- Mclust(iris[,1:4], G = 3)
summary(mod2, parameters = TRUE)
# Using prior
mod3 <- Mclust(iris[,1:4], prior = priorControl())
summary(mod3)
mod4 <- Mclust(iris[,1:4], prior = priorControl(functionName="defaultPrior", shrinkage=0.1))
summary(mod4)
# Clustering of faithful data with some artificial noise added
nNoise <- 100
set.seed(0) # to make it reproducible
Noise <- apply(faithful, 2, function(x)
runif(nNoise, min = min(x)-.1, max = max(x)+.1))
data <- rbind(faithful, Noise)
plot(faithful)
points(Noise, pch = 20, cex = 0.5, col = "lightgrey")
set.seed(0)
NoiseInit <- sample(c(TRUE,FALSE), size = nrow(faithful)+nNoise,
replace = TRUE, prob = c(3,1)/4)
mod5 <- Mclust(data, initialization = list(noise = NoiseInit))
summary(mod5, parameter = TRUE)
plot(mod5, what = "classification")