MEDseq_entropy {MEDseq} | R Documentation |
Entropy of a fitted MEDseq model
Description
Calculates the normalised entropy of a fitted MEDseq model.
Usage
MEDseq_entropy(x,
group = FALSE)
Arguments
x |
An object of class |
group |
A logical (defaults to |
Details
When group
is FALSE
, this function calculates the normalised entropy via
H=-\frac{1}{n\log(G)}\sum_{i=1}^n\sum_{g=1}^G\hat{z}_{ig}\log(\hat{z}_{ig})
,
where n
and G
are the sample size and number of components, respectively, and \hat{z}_{ig}
is the estimated posterior probability at convergence that observation i
belongs to component g
.
When group
is TRUE
,
H_i=-\frac{1}{\log(G)}\sum_{g=1}^G\hat{z}_{ig}\log(\hat{z}_{ig})
is computed for each observation and averaged according to the MAP classification.
Value
When group
is FALSE
, a single number, given by 1-H
, in the range [0,1], such that larger values indicate clearer separation of the clusters. Otherwise, a vector of length G
containing the per-component averages of the observation-specific entropies is returned.
Note
This function will always return a normalised entropy of 1
for models fitted using the "CEM"
algorithm (see MEDseq_control
), or models with only one component.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <doi:10.1111/rssa.12712>.
See Also
MEDseq_fit
, MEDseq_control
, MEDseq_AvePP
Examples
# Load the MVAD data
data(mvad)
mvad$Location <- factor(apply(mvad[,5:9], 1L, function(x)
which(x == "yes")), labels = colnames(mvad[,5:9]))
mvad <- list(covariates = mvad[c(3:4,10:14,87)],
sequences = mvad[,15:86],
weights = mvad[,2])
mvad.cov <- mvad$covariates
# Create a state sequence object with the first two (summer) time points removed
states <- c("EM", "FE", "HE", "JL", "SC", "TR")
labels <- c("Employment", "Further Education", "Higher Education",
"Joblessness", "School", "Training")
mvad.seq <- seqdef(mvad$sequences[-c(1,2)], states=states, labels=labels)
# Fit a model with weights and a gating covariate
# Have the probability of noise-component membership be constant
mod <- MEDseq_fit(mvad.seq, G=11, modtype="UUN", weights=mvad$weights,
gating=~ gcse5eq, covars=mvad.cov, noise.gate=FALSE)
# Calculate the normalised entropy
MEDseq_entropy(mod)
# Calculate the normalised entropy per cluster
MEDseq_entropy(mod, group=TRUE)