R: Entropy of a fitted MEDseq model

MEDseq_entropy {MEDseq}

R Documentation

Entropy of a fitted MEDseq model

Description

Calculates the normalised entropy of a fitted MEDseq model.

Usage

MEDseq_entropy(x,
               group = FALSE)

Arguments

`x`	An object of class `"MEDseq"` generated by `MEDseq_fit` or an object of class `"MEDseqCompare"` generated by `MEDseq_compare`.
`group`	A logical (defaults to `FALSE`) indicating whether component-specific average entropies should be returned instead.

Details

When group is FALSE, this function calculates the normalised entropy via

H=-\frac{1}{n\log(G)}\sum_{i=1}^n\sum_{g=1}^G\hat{z}_{ig}\log(\hat{z}_{ig})

, where n and G are the sample size and number of components, respectively, and \hat{z}_{ig} is the estimated posterior probability at convergence that observation i belongs to component g.

When group is TRUE,

H_i=-\frac{1}{\log(G)}\sum_{g=1}^G\hat{z}_{ig}\log(\hat{z}_{ig})

is computed for each observation and averaged according to the MAP classification.

Value

When group is FALSE, a single number, given by 1-H, in the range [0,1], such that larger values indicate clearer separation of the clusters. Otherwise, a vector of length G containing the per-component averages of the observation-specific entropies is returned.

Note

This function will always return a normalised entropy of 1 for models fitted using the "CEM" algorithm (see MEDseq_control), or models with only one component.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <doi:10.1111/rssa.12712>.

Examples

# Load the MVAD data
data(mvad)
mvad$Location <- factor(apply(mvad[,5:9], 1L, function(x) 
                 which(x == "yes")), labels = colnames(mvad[,5:9]))
mvad          <- list(covariates = mvad[c(3:4,10:14,87)],
                      sequences = mvad[,15:86], 
                      weights = mvad[,2])
mvad.cov      <- mvad$covariates

# Create a state sequence object with the first two (summer) time points removed
states        <- c("EM", "FE", "HE", "JL", "SC", "TR")
labels        <- c("Employment", "Further Education", "Higher Education", 
                   "Joblessness", "School", "Training")
mvad.seq      <- seqdef(mvad$sequences[-c(1,2)], states=states, labels=labels)

# Fit a model with weights and a gating covariate
# Have the probability of noise-component membership be constant
mod           <- MEDseq_fit(mvad.seq, G=11, modtype="UUN", weights=mvad$weights, 
                            gating=~ gcse5eq, covars=mvad.cov, noise.gate=FALSE)

# Calculate the normalised entropy
MEDseq_entropy(mod)

# Calculate the normalised entropy per cluster
MEDseq_entropy(mod, group=TRUE)

[Package MEDseq version 1.4.1 Index]