RCLRMIX-class {rebmix}R Documentation

Class "RCLRMIX"

Description

Object of class RCLRMIX.

Objects from the Class

Objects can be created by calls of the form new("RCLRMIX", ...). Accessor methods for the slots are a.Dataset(x = NULL), a.pos(x = NULL), a.Zt(x = NULL), a.Zp(x = NULL, s = expression(c)), a.c(x = NULL), a.p(x = NULL, s = expression(c)), a.pi(x = NULL, s = expression(c)),
a.P(x = NULL, s = expression(c)), a.tau(x = NULL, s = expression(c)), a.prob(x = NULL), a.Rule(x = NULL), a.from(x = NULL), a.to(x = NULL), a.EN(x = NULL) and a.ED(x = NULL), where x stands for an object of class RCLRMIX and s a desired number of clusters for which the slot is calculated.

Slots

x:

an object of class REBMIX.

Dataset:

a data frame or an object of class Histogram to be clustered.

pos:

a desired row number in x@summary for which the clustering is performed. The default value is 1.

Zt:

a factor of true cluster membership.

Zp:

a factor of predictive cluster membership.

c:

number of nonempty clusters.

p:

a vector of length cc containing prior probabilities of cluster memberships plp_{l} summing to 1. The value is returned only if all variables in slot x follow either binomial or Dirac parametric families. The default value is numeric().

pi:

a list of length dd of matrices of size c×Kic \times K_{i} containing cluster conditional probabilities πilk\pi_{ilk}. Let πilk\pi_{ilk} denote the cluster conditional probability that an observation in cluster l=1,,cl = 1, \ldots, c produces the kkth outcome on the iith variable. Suppose we observe i=1,,di = 1, \ldots, d polytomous categorical variables (the manifest variables), each of which contains KiK_{i} possible outcomes for observations j=1,,nj = 1, \ldots, n. A manifest variable is a variable that can be measured or observed directly. It must be coded as whole number starting at zero for the first outcome and increasing to the possible number of outcomes minus one. It is presumed here that all variables are statistically independentand within clusters and that y1,,yn\bm{y}_{1}, \ldots, \bm{y}_{n} stands for an observed dd dimensional dataset of size nn of vector observations yj=(y1j,,yij,,ydj)\bm{y}_{j} = (y_{1j}, \ldots, y_{ij}, \ldots, y_{dj})^\top. The value is returned only if all variables in slot x follow either binomial or Dirac parametric families. The default value is list().

P:

a data frame containing true Nt(yȷ~)N_{\mathrm{t}}(\bm{y}_{\tilde{\jmath}}) and predictive Np(yȷ~)N_{\mathrm{p}}(\bm{y}_{\tilde{\jmath}}) frequencies calculated for unique yȷ~{y1,,yn}\bm{y}_{\tilde{\jmath}} \in \{ \bm{y}_{1}, \ldots, \bm{y}_{n} \}, where ȷ~=1,,n~\tilde{\jmath} = 1, \ldots, \tilde{n} and n~n\tilde{n} \leq n.

tau:

a matrix of size n×cn \times c containing conditional probabilities τjl\tau_{jl} that observations y1,,yn\bm{y}_{1}, \ldots, \bm{y}_{n} arise from clusters 1,,c1, \ldots, c.

prob:

a vector of length cc containing probabilities of correct clustering for s=1,,cs = 1, \ldots, c.

Rule:

a character containing the merging rule. One of "Entropy" and "Demp". The default value is "Entropy".

from:

a vector of length c1c - 1 containing clusters merged to to clusters.

to:

a vector of length c1c - 1 containing clusters originating from from clusters.

EN:

a vector of length c1c - 1 containing entropies for combined clusters.

ED:

a vector of length c1c - 1 containing decrease of entropies for combined clusters.

A:

an adjacency matrix of size cmax×cmaxc_{\mathrm{max}} \times c_{\mathrm{max}}, where cmaxcc_{\mathrm{max}} \geq c.

Author(s)

Marko Nagode, Branislav Panic

References

J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo and R. Gottardo. Combining mixture components for clustering. Journal of Computational and Graphical Statistics, 19(2):332-353, 2010. doi:10.1198/jcgs.2010.08111

S. Kyoya and K. Yamanishi. Summarizing finite mixture model with overlapping quantification. Entropy, 23(11):1503, 2021. doi:10.3390/e23111503

Examples

devAskNewPage(ask = TRUE)

# Generate normal dataset.

n <- c(500, 200, 400)

Theta <- new("RNGMVNORM.Theta", c = 3, d = 2)

a.theta1(Theta, 1) <- c(3, 10)
a.theta1(Theta, 2) <- c(8, 6)
a.theta1(Theta, 3) <- c(12, 11)
a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2)
a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5)
a.theta2(Theta, 3) <- c(2, 1, 1, 2)

normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = "normal_1", n = n, Theta = a.Theta(Theta))

# Estimate number of components, component weights and component parameters.

normalest <- REBMIX(model = "REBMVNORM",
  Dataset = a.Dataset(normal),
  Preprocessing = "histogram",
  cmax = 6,
  Criterion = "BIC")

summary(normalest)

# Plot finite mixture.

plot(normalest)

# Cluster dataset.

normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest, Zt = a.Zt(normal))

# Plot clusters.

plot(normalclu)

summary(normalclu)

[Package rebmix version 2.16.0 Index]