R: Choose the best MoEClust model

MoE_compare {MoEClust}

R Documentation

Choose the best MoEClust model

Description

Takes one or more sets of MoEClust models fitted by MoE_clust (or MoE_stepwise) and ranks them according to the BIC, ICL, or AIC. It's possible to respect the internal ranking within each set of models, or to discard models within each set which were already deemed sub-optimal. This function can help with model selection via exhaustive or stepwise searches.

Usage

MoE_compare(...,
            criterion = c("bic", "icl", "aic"),
            pick = 10L,
            optimal.only = FALSE)

## S3 method for class 'MoECompare'
print(x,
      index = seq_len(x$pick),
      posidens = TRUE,
      rerank = FALSE,
      digits = 3L,
      details = TRUE, 
      maxi = length(index),
      ...)

Arguments

`...`	One or more objects of class `"MoEClust"` outputted by `MoE_clust`. All models must have been fit to the same data set. A single named list of such objects can also be supplied. Additionally, objects of class `"MoECompare"` outputted by this very function can also be supplied here. This argument is only relevant for the `MoE_compare` function and will be ignored for the associated `print` function.
`criterion`	The criterion used to determine the ranking. Defaults to `"bic"`.
`pick`	The (integer) number of models to be ranked and compared. Defaults to `10L`. Will be constrained by the number of models within the `"MoEClust"` objects supplied via `...` if `optimal.only` is `FALSE`, otherwise constrained simply by the number of `"MoEClust"` objects supplied. Setting `pick=Inf` is a valid way to select all models.
`optimal.only`	Logical indicating whether to only rank models already deemed optimal within each `"MoEClust"` object (`TRUE`), or to allow models which were deemed suboptimal enter the final ranking (`FALSE`, the default). See `details`.
`x`, `index`, `posidens`, `rerank`, `digits`, `details`, `maxi`	Arguments required for the associated `print` function: `x` An object of class `"MoECompare"` resulting from a call to `MoE_compare`. `index` A logical or numeric vector giving the indices of the rows of the table of ranked models to print. This defaults to the full set of ranked models. It can be useful when the table of ranked models is large to examine a subset via this `index` argument, for display purposes. See `rerank`. `posidens` A logical indicating whether models which have been flagged for having positive log-densities should be included in the comparison (defaults to `TRUE`). Such models may correspond to spurious solutions and can be discarded by specifying `posidens=FALSE`. Only relevant if any of the `"MoEClust"` objects being compared were themselves run with `posidens=TRUE`. `rerank` A logical indicating whether the ranks should be recomputed when subsetting using `index`. Defaults to `FALSE`. Only relevant when `details=TRUE`. `digits` The number of decimal places to round model selection criteria to (defaults to 3). `details` Logical indicating whether some additional details should be printed, defaults to `TRUE`. Exists to facilitate `MoE_stepwise` printing. `maxi` A number specifying the maximum number of rows/models to print. Defaults to `length(index)`.

Details

The purpose of this function is to conduct model selection on "MoEClust" objects, fit to the same data set, with different combinations of gating/expert network covariates or different initialisation settings.

Model selection will have already been performed in terms of choosing the optimal number of components and GPCM/mclust model type within each supplied set of results, but MoE_compare will respect the internal ranking of models when producing the final ranking if optimal.only is FALSE: otherwise only those models already deemed optimal within each "MoEClust" object will be ranked.

As such if two sets of results are supplied when optimal.only is FALSE, the 1st, 2nd and 3rd best models could all belong to the first set of results, meaning a model deemed suboptimal according to one set of covariates could be superior to one deemed optimal under another set of covariates.

Value

A list of class "MoECompare", for which a dedicated print function exists, containing the following elements (each of length pick, and ranked according to criterion, where appropriate):

`data`	The name of the data set to which the models were fitted.
`optimal`	The single optimal model (an object of class `"MoEClust"`) among those supplied, according to the chosen `criterion`.
`pick`	The final number of ranked models. May be different (i.e. less than) the supplied `pick` value.
`MoENames`	The names of the supplied `"MoEClust"` objects.
`modelNames`	The `mclustModelNames`.
`G`	The optimal numbers of components.
`df`	The numbers of estimated parameters.
`iters`	The numbers of EM/CEM iterations.
`bic`	BIC values, ranked according to `criterion`.
`icl`	ICL values, ranked according to `criterion`.
`aic`	AIC values, ranked according to `criterion`.
`loglik`	Maximal log-likelihood values, ranked according to `criterion`.
`gating`	The gating formulas.
`expert`	The expert formulas.
`algo`	The algorithm used for fitting the model - either `"EM"`, `"CEM"`, `"cemEM"`.
`equalPro`	Logical indicating whether mixing proportions were constrained to be equal across components.
`hypvol`	Hypervolume parameters for the noise component if relevant, otherwise set to `NA` (see `MoE_control`).
`noise`	The type of noise component fitted (if any). Only displayed if at least one of the compared models has a noise component.
`noise.gate`	Logical indicating whether gating covariates were allowed to influence the noise component's mixing proportion. Only printed for models with a noise component, when at least one of the compared models has gating covariates.
`equalNoise`	Logical indicating whether the mixing proportion of the noise component for `equalPro` models is also equal (`TRUE`) or estimated (`FALSE`).

Note

The criterion argument here need not comply with the criterion used for model selection within each "MoEClust" object, but be aware that a mismatch in terms of criterion may require the optimal model to be re-fit in order to be extracted, thereby slowing down MoE_compare.

If random starts had been used via init.z="random" the optimal model may not necessarily correspond to the highest-ranking model in the presence of a criterion mismatch, due to the randomness of the initialisation.

A dedicated print function exists for objects of class "MoECompare".

plot.MoEClust and as.Mclust can both also be called on objects of class "MoECompare".

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.

Examples

data(CO2data)
CO2   <- CO2data$CO2
GNP   <- CO2data$GNP

# Fit a range of models 
m1    <- MoE_clust(CO2, G=1:3)
m2    <- MoE_clust(CO2, G=2:3, gating= ~ GNP)
m3    <- MoE_clust(CO2, G=1:3, expert= ~ GNP)
m4    <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP)
m5    <- MoE_clust(CO2, G=2:3, equalPro=TRUE)
m6    <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE)
m7    <- MoE_clust(CO2, G=2:3, expert= ~ GNP, tau0=0.1)

# Rank only the optimal models and examine the best model
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, m7, optimal.only=TRUE))
(best <- comp$optimal)
(summ <- summary(best, classification=TRUE, parameters=TRUE, networks=TRUE))

# Examine all models visited, including those already deemed suboptimal
# Only print models with expert covariates & more than one component
comp2 <- MoE_compare(m1, m2, m3, m4, m5, m6, m7, pick=Inf)
print(comp2, index=comp2$expert != "None" & comp2$G > 1)

# Conduct a stepwise search on the same data
(mod1 <- MoE_stepwise(CO2, GNP))

# Conduct another stepwise search considering models with a noise component
(mod2 <- MoE_stepwise(CO2, GNP, noise=TRUE))

# Compare both sets of results to choose the optimal model
(best <- MoE_compare(mod1, mod2, optimal.only=TRUE)$optimal)

[Package MoEClust version 1.5.2 Index]