MEDseq_clustnames {MEDseq} | R Documentation |
Automatic labelling of clusters using central sequences
Description
These functions extract names for clusters according to the SPS representation of their central sequences.
Usage
MEDseq_clustnames(x,
cluster = TRUE,
size = FALSE,
weighted = FALSE,
...)
MEDseq_nameclusts(names)
Arguments
x |
An object of class |
cluster |
A logical indicating whether names should be prepended with the text " |
size |
A logical indicating whether the (typically 'soft') size of each cluster is appended to the label of each group, expressed as a percentage of the total number of observations. Defaults to |
weighted |
A logical indicating whether the sampling weights (if any) are used when appending the |
... |
Catches unused arguments. |
names |
The output of |
Details
Unlike the seqclustname
function from the WeightedCluster package which inspired these functions, MEDseq_clustnames
only returns the names themselves, not the factor
variable indicating cluster membership with labels given by those names. Thus, MEDseq_nameclusts
is provided as a convenience function for precisely this purpose (see Examples
).
Value
For MEDseq_clustnames
, a character vector containing the names for each component defined by their central sequence, and optionally the cluster name (see cluster
above) and cluster size (see size
above). The name for the noise component, if any, will always be simply "Noise"
(or "Cluster 0: Noise"
).
For MEDseq_nameclusts
, a factor version of x$MAP
with levels given by the output of MEDseq_clustnames
.
Note
The main MEDseq_clustnames
function is used internally by plot.MEDseq
, MEDseq_meantime
, MEDseq_stderr
, and also other print
and summary
methods, where its invocation can typically controlled via a SPS
logical argument. However, the optional arguments cluster
, size
, and weighted
can only be passed through plot.MEDseq
; elsewhere cluster=TRUE
, size=FALSE
, and weighted=FALSE
are always assumed.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <doi:10.1111/rssa.12712>.
See Also
seqformat
, seqclustname
, plot.MEDseq
, MEDseq_meantime
, MEDseq_stderr
Examples
# Load the MVAD data
data(mvad)
mvad$Location <- factor(apply(mvad[,5:9], 1L, function(x)
which(x == "yes")), labels = colnames(mvad[,5:9]))
mvad <- list(covariates = mvad[c(3:4,10:14,87)],
sequences = mvad[,15:86],
weights = mvad[,2])
mvad.cov <- mvad$covariates
# Create a state sequence object with the first two (summer) time points removed
states <- c("EM", "FE", "HE", "JL", "SC", "TR")
labels <- c("Employment", "Further Education", "Higher Education",
"Joblessness", "School", "Training")
mvad.seq <- seqdef(mvad$sequences[-c(1,2)], states=states, labels=labels)
# Fit a model with weights and a gating covariate
# Have the probability of noise-component membership depend on the covariate
mod <- MEDseq_fit(mvad.seq, G=5, modtype="UUN", weights=mvad$weights,
gating=~ gcse5eq, covars=mvad.cov, noise.gate=TRUE)
# Extract the names
names <- MEDseq_clustnames(mod, cluster=FALSE, size=TRUE)
# Get the renamed MAP cluster membership indicator vector
group <- MEDseq_nameclusts(names)
# Use the output in plots
plot(mod, type="d", soft=FALSE, weighted=FALSE, cluster=FALSE, size=TRUE, border=TRUE)
# same as:
# seqplot(mvad.seq, type="d", group=group)
# Indeed, this function is invoked by default for certain plot types
plot(mod, type="d", soft=TRUE, weighted=TRUE)
plot(mod, type="d", soft=TRUE, weighted=TRUE, SPS=FALSE)
# Invoke this function when printing the gating network coefficients
print(mod$gating, SPS=FALSE)
print(mod$gating, SPS=TRUE)
# Invoke this function in a call to MEDseq_meantime
MEDseq_meantime(mod, SPS=TRUE)
# Invoke this function in other plots
plot(mod, type="clusters", SPS=TRUE)
plot(mod, type="precision", SPS=TRUE)