pugmm {PUGMM} | R Documentation |
Parsimonious Ultrametric Gaussian Mixture Models
Description
Model-based clustering via Parsimonious Ultrametric Gaussian Mixture Models. Hierarchical relationships among variables within and between clusters are inspected. The grouped coordinate ascent algorithm is used for the parameter estimation. The optimal model is selected according to BIC.
Usage
pugmm(
X,
G = NULL,
m = NULL,
normalization = NULL,
model = NULL,
maxiter = 500,
tol = 1e-06,
stop = "aitken",
rndstart = 1,
initG = "kmeans",
initm = "ucms",
gaussian = "mclust",
parallel = FALSE
)
Arguments
X |
( |
G |
Integer (vector) specifying the number of mixture components (default: |
m |
Integer (vector) specifying the number of variable groups (default: |
normalization |
Character string specifying the data transformation. If |
model |
Vector of character strings indicating the model names to be fitted. If |
maxiter |
Integer value specifying the maximum number of iterations of the algorithm (default: |
tol |
Numeric value specifying the tolerance for the convergence criterion (default: |
stop |
Character string specifying the convergence criteria. If "aitken", the Aitken acceleration-based stopping rule is used (default); if "relative", the relative log-likelihood in two sequential iterations is evaluated. |
rndstart |
Integer value specifying the number of random starts (default: |
initG |
Character string specifying the method for the initialization of the unit-component membership. If "kmeans", k-means via RcppArmadillo is used (default). Other options are: "random" for random assignment; "kmeansf" for fuzzy c-means (via the function fcm of the package ppclust). |
initm |
Character string specifying the method for the initialization of the variable-group membership. If "ucms", the multivariate model to be used for obtaining the variable-group membership estimated is the same model.name used for estimating the Parsimonious Ultrametric Gaussian Mixture Model (default); if "random", a random assignment is performed. |
gaussian |
Character string specifying the way to compute the log-likelihood. If "mclust", |
parallel |
A logical value, specifying whether the models should be run in parallel. |
Details
The grouped coordinate ascent algorithm used for the estimation of PUGMMs parameters was demonstrated to be equivalent to an Expectation-Maximization algorithm in the GMM framework (Hathaway, 1986).
Value
An object of class pugmm
containing the results of the optimal - according to BIC - Parsimonious Ultrametric Gaussian Mixture Model estimation.
call
Matched call.
X
Input data matrix.
G
Number of components of the best model.
m
Number of variable groups of the best model.
label
Integer vector of dimension n
, taking values in \{1, \ldots, G\}
. It identifies the unit classification according to the maximum a posteriori of the best model.
pp
Numeric vector of dimension G
containing the prior probabilities for the best model.
mu
(G \times p
) numeric matrix containing the component mean vectors (by row) for the best model.
sigma
List of dimension G
containing the (p \times p
) numeric component extended ultrametric covariance matrices for the best model.
V
List of dimension G
containing the (p \times m
) binary variable-group membership matrices for the best model.
Sv
List of dimension G
containing the (m \times m
) numeric diagonal matrices of the group variances for the best model.
Sw
List of dimension G
containing the (m \times m
) numeric diagonal matrices of the within-group covariances for the best model.
Sb
List of dimension G
containing the (m \times m
) numeric hallow matrices of the between-group covariances for the best model.
post
(n \times G
) numeric matrix containing the posterior probabilities for the best model.
pm
Number of parameters of the best model.
pm.cov
Number of covariance parameters of the best model.
pm.free
Number of free parameters of the best model (pm
- (constraints on V
+ count.constr.SwSb
+ count.constr.SvSw
)).
count.constr.SwSb
Number of times the constraint between Sw
and Sb
has been turned on for the best model.
count.constr.SvSw
Number of times the constraint between Sv
and Sw
has been turned on for the best model.
BIC
BIC values for all the fitted models. If BIC is NA
, the model has not been computed since its structure is equal to another model, while if BIC is -Inf
the solution has a number of clusters < G
.
bic
BIC value of the best model.
loglik
Log-likelihood of the best model.
loop
Random start corresponding to the selected solution of the best model.
iter
Number of iterations needed to estimate the best model.
model.name
Character string denoting the PUGMM model name of the best model among the ones fitted.
messages
Messages.
References
Cavicchia, C., Vichi, M., Zaccaria, G. (2024) Parsimonious ultrametric Gaussian mixture models. Statistics and Computing, 34, 108.
Cavicchia, C., Vichi, M., Zaccaria, G. (2022) Gaussian mixture model with an extended ultrametric covariance structure. Advances in Data Analysis and Classification, 16(2), 399-427.
Hathaway, R. (1986) Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters, 4(2), 53-56.
See Also
pugmm_available_models()
, plot.pugmm()
Examples
data(penguins)
x <- scale(penguins[, 2:5])
pugmm.penguins <- pugmm(x, 3, 1)
table(penguins$species, pugmm.penguins$label)
pugmm.penguins <- pugmm(x)
pugmm.penguins$G
pugmm.penguins$m
pugmm.penguins$model.name