MEDseq_control {MEDseq} | R Documentation |
Set control values for use with MEDseq_fit
Description
Supplies a list of arguments (with defaults) for use with MEDseq_fit
.
Usage
MEDseq_control(algo = c("EM", "CEM", "cemEM"),
init.z = c("kmedoids", "kmodes", "kmodes2", "hc", "random", "list"),
z.list = NULL,
dist.mat = NULL,
unique = TRUE,
criterion = c("bic", "icl", "aic", "dbs", "asw", "cv", "nec"),
tau0 = NULL,
noise.gate = TRUE,
random = TRUE,
do.cv = FALSE,
do.nec = FALSE,
nfolds = 10L,
nstarts = 1L,
stopping = c("aitken", "relative"),
equalPro = FALSE,
equalNoise = FALSE,
tol = c(1E-05, 1E-08),
itmax = c(.Machine$integer.max, 1000L),
opti = c("mode", "medoid", "first", "GA"),
ordering = c("none", "decreasing", "increasing"),
MaxNWts = 1000L,
verbose = TRUE,
...)
Arguments
algo |
Switch controlling whether models are fit using the |
init.z |
The method used to initialise the cluster labels. All options respect the presence of sampling The |
z.list |
A user supplied list of initial cluster allocation matrices, with number of rows given by the number of observations, and numbers of columns given by the range of component numbers being considered. Only relevant if |
dist.mat |
An optional distance matrix to use for initialisation when |
unique |
A logical indicating whether the model is fit only to the unique observations (defaults to When When In both cases, the results will be unchanged, but setting |
criterion |
When either |
tau0 |
Prior mixing proportion for the noise component. If supplied, a noise component will be added to the model in the estimation, with |
noise.gate |
A logical indicating whether gating network covariates influence the mixing proportion for the noise component, if any. Defaults to |
random |
A logical governing how ties for estimated central sequence positions are handled. When Note that this argument is also passed to |
do.cv |
A logical indicating whether cross-validated log-likelihood scores should also be computed (see |
do.nec |
A logical indicating whether the normalised entropy criterion (NEC) should also be computed (for models with more than one component). Defaults to |
nfolds |
The number of folds to use when |
nstarts |
The number of random initialisations to use when |
stopping |
The criterion used to assess convergence of the EM/CEM algorithm. The default ( |
equalPro |
Logical variable indicating whether or not the mixing proportions are to be constrained to be equal in the model. Default: |
equalNoise |
Logical which is only invoked when |
tol |
A vector of length two giving relative convergence tolerances for 1) the log-likelihood of the EM/CEM algorithm, and 2) optimisation in the multinomial logistic regression in the gating network, respectively. The default is |
itmax |
A vector of length two giving integer limits on the number of iterations for 1) the EM/CEM algorithm, and 2) the multinomial logistic regression in the gating network, respectively. The default is If, for any model with gating covariates, the multinomial logistic regression in the gating network fails to converge in |
opti |
Character string indicating how central sequence parameters should be estimated. The default |
ordering |
Experimental feature that should only be tampered with by experienced users. Allows sequences to be reordered on the basis of the column-wise entropy when |
MaxNWts |
The maximum allowable number of weights in the call to |
verbose |
Logical indicating whether to print messages pertaining to progress to the screen during fitting. By default is |
... |
Catches unused arguments, and also allows the optional arguments |
Details
MEDseq_control
is provided for assigning values and defaults within MEDseq_fit
. While the criterion
argument controls the choice of the optimal number of components and MEDseq model type (in terms of the constraints or lack thereof on the precision parameters), MEDseq_compare
is provided for choosing between fits with different combinations of covariates or different initialisation settings.
Value
A named list in which the names are the names of the arguments and the values are the values supplied to the arguments.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <doi:10.1111/rssa.12712>.
Menardi, G. (2011). Density-based silhouette diagnostics for clustering methods. Statistics and Computing, 21(3): 295-308.
Hoos, H. and T. Stützle (2004). Stochastic Local Search: Foundations and Applications. The Morgan Kaufman Series in Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufman Publishers Inc.
See Also
MEDseq_fit
, dbs
, wcKMedoids
, pam
, wKModes
, hclust
, seqdist
, multinom
, MEDseq_compare
Examples
# The CC MEDseq model is almost equivalent to k-medoids when the
# CEM algorithm is employed, mixing proportions are constrained,
# and the central sequences are restricted to the observed sequences
ctrl <- MEDseq_control(algo="CEM", equalPro=TRUE, opti="medoid", criterion="asw")
data(mvad)
# Note that ctrl must be explicitly named 'ctrl'
mod <- MEDseq_fit(seqdef(mvad[,17:86]), G=11, modtype="CC", weights=mvad$weight, ctrl=ctrl)
# Alternatively, specify the control arguments directly
mod <- MEDseq_fit(seqdef(mvad[,17:86]), G=11, modtype="CC", weights=mvad$weight,
algo="CEM", equalPro=TRUE, opti="medoid", criterion="asw")
# Note that supplying control arguments via a mix of the ... construct and the named argument
# 'control' or supplying MEDseq_control output without naming it 'control' can throw an error