MEDseq-package {MEDseq}R Documentation

MEDseq: Mixtures of Exponential-Distance Models with Covariates

Description

Fits MEDseq models: mixtures of Exponential-Distance models with gating covariates and sampling weights. Typically used for clustering categorical/longitudinal life-course sequences.

Usage

Fits _MEDseq_ models introduced by Murphy et al. (2021) <doi:10.1111/rssa.12712>, i.e. fits mixtures of exponential-distance models for clustering longitudinal life-course sequence data via the EM/CEM algorithm.

A family of parsimonious precision parameter constraints are accommodated. So too are sampling weights. Gating covariates can be supplied via formula interfaces.

The most important function in the MEDseq package is: MEDseq_fit, for fitting the models via EM/CEM. This function requires the data to be in "stslist" format; the function seqdef is conveniently reexported from the TraMineR package for this purpose.

MEDseq_control allows supplying additional arguments which govern, among other things, controls on the initialisation of the allocations for the EM/CEM algorithm and the various model selection options.

MEDseq_compare is provided for conducting model selection between different results from using different covariate combinations &/or initialisation strategies, etc.

MEDseq_stderr is provided for computing the standard errors of the coefficients for the covariates in the gating network.

A dedicated plotting function plot.MEDseq exists for visualising various aspects of the results, using new methods as well as some existing methods adapted from the TraMineR package.

Finally, the package also contains two data sets: biofam and mvad.

Details

Type:

Package

Package:

MEDseq

Version:

1.4.1

Date:

2023-12-12 (this version), 2019-08-24 (original release)

Licence:

GPL (>= 3)

See Also

Further details and examples are given in the associated vignette document:
vignette("MEDseq", package = "MEDseq")

Author(s)

Keefe Murphy [aut, cre], Thomas Brendan Murphy [ctb], Raffaella Piccarreta [ctb], Isobel Claire Gormley [ctb]

Maintainer: Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <doi:10.1111/rssa.12712>.

See Also

Useful links:

Examples

# Load the MVAD data
data(mvad)
mvad$Location <- factor(apply(mvad[,5:9], 1L, function(x) 
                 which(x == "yes")), labels = colnames(mvad[,5:9]))
mvad          <- list(covariates = mvad[c(3:4,10:14,87)],
                      sequences = mvad[,15:86], 
                      weights = mvad[,2])
mvad.cov      <- mvad$covariates

# Create a state sequence object with the first two (summer) time points removed
states        <- c("EM", "FE", "HE", "JL", "SC", "TR")
labels        <- c("Employment", "Further Education", "Higher Education", 
                   "Joblessness", "School", "Training")
mvad.seq      <- seqdef(mvad$sequences[-c(1,2)], states=states, labels=labels)
                         
# Fit a range of unweighted models without covariates
# Only consider models with a noise component
# Supply some MEDseq_control() arguments
mod1          <- MEDseq_fit(mvad.seq, G=9:10, modtype=c("CCN", "CUN", "UCN", "UUN"),
                            algo="CEM", init.z="kmodes", criterion="icl")

# Fit a model with weights and gating covariates
# Have the probability of noise-component membership be constant
mod2          <- MEDseq_fit(mvad.seq, G=11, modtype="UUN", weights=mvad$weights, 
                            gating=~ gcse5eq, covars=mvad.cov, noise.gate=FALSE)
                            
# Examine this model and its gating network
summary(mod2, network=TRUE)
plot(mod2, "clusters")

[Package MEDseq version 1.4.1 Index]