MEDseq_compare {MEDseq} | R Documentation |
Choose the best MEDseq model
Description
Takes one or more sets of "MEDseq"
models fitted by MEDseq_fit
and ranks them according to a specified model selection criterion. It's possible to respect the internal ranking within each set of models, or to discard models within each set which were already deemed sub-optimal. This function can help with model selection via exhaustive or stepwise searches.
Usage
MEDseq_compare(...,
criterion = c("bic", "icl", "aic",
"dbs", "asw", "cv", "nec"),
pick = 10L,
optimal.only = FALSE)
## S3 method for class 'MEDseqCompare'
print(x,
index = seq_len(x$pick),
rerank = FALSE,
digits = 3L,
maxi = length(index),
...)
Arguments
... |
One or more objects of class This argument is only relevant for the |
criterion |
The criterion used to determine the ranking. Defaults to |
pick |
The (integer) number of models to be ranked and compared. Defaults to |
optimal.only |
Logical indicating whether to only rank models already deemed optimal within each |
x , index , rerank , digits , maxi |
Arguments required for the associated
|
Details
The purpose of this function is to conduct model selection on "MEDseq"
objects, fit to the same data set, with different combinations of gating network covariates or different initialisation settings.
Model selection will have already been performed in terms of choosing the optimal number of components and MEDseq model type within each supplied set of results, but MEDseq_compare
will respect the internal ranking of models when producing the final ranking if optimal.only
is FALSE
: otherwise only those models already deemed optimal within each "MEDseq"
object will be ranked.
As such if two sets of results are supplied when optimal.only
is FALSE
, the 1st, 2nd, and 3rd best models could all belong to the first set of results, meaning a model deemed suboptimal according to one set of covariates could be superior to one deemed optimal under another set of covariates.
Value
A list of class "MEDseqCompare"
, for which a dedicated print function exists, containing the following elements (each of length pick
, and ranked according to criterion
, where appropriate):
data |
The name of the data set to which the models were fitted. |
optimal |
The single optimal model (an object of class |
pick |
The final number of ranked models. May be different (i.e. less than) the supplied |
MEDNames |
The names of the supplied |
modelNames |
The MEDseq model names (denoting the constraints or lack thereof on the precision parameters). |
G |
The optimal numbers of components. |
df |
The numbers of estimated parameters. |
iters |
The numbers of EM/CEM iterations. |
bic |
BIC values, ranked according to |
icl |
ICL values, ranked according to |
aic |
AIC values, ranked according to |
dbs |
(Weighted) mean/median DBS values, ranked according to |
asw |
(Weighted) mean/median ASW values, ranked according to |
cv |
Cross-validated log-likelihood values, ranked according to |
nec |
NEC values, ranked according to |
loglik |
Maximal log-likelihood values, ranked according to |
gating |
The gating formulas. |
algo |
The algorithm used for fitting the model - either |
equalPro |
Logical indicating whether mixing proportions were constrained to be equal across components. |
opti |
The method used for estimating the central sequence(s). |
weights |
Logical indicating whether the given model was fitted with sampling weights. |
noise |
Logical indicating the presence/absence of a noise component. Only displayed if at least one of the compared models has a noise component. |
noise.gate |
Logical indicating whether gating covariates were allowed to influence the noise component's mixing proportion. Only printed for models with a noise component, when at least one of the compared models has gating covariates. |
equalNoise |
Logical indicating whether the mixing proportion of the noise component for |
Note
The criterion
argument here need not comply with the criterion used for model selection within each "MEDseq"
object, but be aware that a mismatch in terms of criterion
may require the optimal model to be re-fit in order to be extracted, thereby slowing down MEDseq_compare
.
If random starts had been used via init.z="random"
the optimal
model may not necessarily correspond to the highest-ranking model in the presence of a criterion mismatch, due to the randomness of the initialisation.
A dedicated print
function exists for objects of class "MEDseqCompare"
and plot.MEDseq
can also be called on objects of class "MEDseqCompare"
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <doi:10.1111/rssa.12712>.
See Also
Examples
data(biofam)
seqs <- seqdef(biofam[10:25] + 1L,
states = c("P", "L", "M", "L+M", "C",
"L+C", "L+M+C", "D"))
covs <- cbind(biofam[2:3], age=2002 - biofam$birthyr)
# Fit a range of models
# m1 <- MEDseq_fit(seqs, G=9:10)
# m2 <- MEDseq_fit(seqs, G=9:10, gating=~sex, covars=covs, noise.gate=FALSE)
# m3 <- MEDseq_fit(seqs, G=9:10, gating=~age, covars=covs, noise.gate=FALSE)
# m4 <- MEDseq_fit(seqs, G=9:10, gating=~sex + age, covars=covs, noise.gate=FALSE)
# Rank only the optimal models (according to the dbs criterion)
# Examine the best model in more detail
# (comp <- MEDseq_compare(m1, m2, m3, m4, criterion="dbs", optimal.only=TRUE))
# (best <- comp$optimal)
# (summ <- summary(best, parameters=TRUE))
# Examine all models visited, including those already deemed suboptimal
# Only print models with gating covariates & 10 components
# comp2 <- MEDseq_compare(comp, m1, m2, m3, m4, criterion="dbs", pick=Inf)
# print(comp2, index=comp2$gating != "None" & comp2$G == 10)