| RCLRMIX-class {rebmix} | R Documentation |
Class "RCLRMIX"
Description
Object of class RCLRMIX.
Objects from the Class
Objects can be created by calls of the form new("RCLRMIX", ...).
Accessor methods for the slots are a.Dataset(x = NULL), a.pos(x = NULL), a.Zt(x = NULL),
a.Zp(x = NULL, s = expression(c)), a.c(x = NULL),
a.p(x = NULL, s = expression(c)), a.pi(x = NULL, s = expression(c)),
a.P(x = NULL, s = expression(c)), a.tau(x = NULL, s = expression(c)),
a.prob(x = NULL), a.Rule(x = NULL), a.from(x = NULL), a.to(x = NULL),
a.EN(x = NULL) and a.ED(x = NULL), where x stands for an object of class RCLRMIX and s
a desired number of clusters for which the slot is calculated.
Slots
x:-
an object of class
REBMIX. Dataset:-
a data frame or an object of class
Histogramto be clustered. pos:-
a desired row number in
x@summaryfor which the clustering is performed. The default value is1. Zt:-
a factor of true cluster membership.
Zp:-
a factor of predictive cluster membership.
c:-
number of nonempty clusters.
p:-
a vector of length
ccontaining prior probabilities of cluster membershipsp_{l}summing to 1. The value is returned only if all variables in slotxfollow either binomial or Dirac parametric families. The default value isnumeric(). pi:-
a list of length
dof matrices of sizec \times K_{i}containing cluster conditional probabilities\pi_{ilk}. Let\pi_{ilk}denote the cluster conditional probability that an observation in clusterl = 1, \ldots, cproduces thekth outcome on theith variable. Suppose we observei = 1, \ldots, dpolytomous categorical variables (the manifest variables), each of which containsK_{i}possible outcomes for observationsj = 1, \ldots, n. A manifest variable is a variable that can be measured or observed directly. It must be coded as whole number starting at zero for the first outcome and increasing to the possible number of outcomes minus one. It is presumed here that all variables are statistically independentand within clusters and that\bm{y}_{1}, \ldots, \bm{y}_{n}stands for an observedddimensional dataset of sizenof vector observations\bm{y}_{j} = (y_{1j}, \ldots, y_{ij}, \ldots, y_{dj})^\top. The value is returned only if all variables in slotxfollow either binomial or Dirac parametric families. The default value islist(). P:-
a data frame containing true
N_{\mathrm{t}}(\bm{y}_{\tilde{\jmath}})and predictiveN_{\mathrm{p}}(\bm{y}_{\tilde{\jmath}})frequencies calculated for unique\bm{y}_{\tilde{\jmath}} \in \{ \bm{y}_{1}, \ldots, \bm{y}_{n} \}, where\tilde{\jmath} = 1, \ldots, \tilde{n}and\tilde{n} \leq n. tau:-
a matrix of size
n \times ccontaining conditional probabilities\tau_{jl}that observations\bm{y}_{1}, \ldots, \bm{y}_{n}arise from clusters1, \ldots, c. prob:-
a vector of length
ccontaining probabilities of correct clustering fors = 1, \ldots, c. Rule:-
a character containing the merging rule. One of
"Entropy"and"Demp". The default value is"Entropy". from:-
a vector of length
c - 1containing clusters merged totoclusters. to:-
a vector of length
c - 1containing clusters originating fromfromclusters. EN:-
a vector of length
c - 1containing entropies for combined clusters. ED:-
a vector of length
c - 1containing decrease of entropies for combined clusters. A:-
an adjacency matrix of size
c_{\mathrm{max}} \times c_{\mathrm{max}}, wherec_{\mathrm{max}} \geq c.
Author(s)
Marko Nagode, Branislav Panic
References
J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo and R. Gottardo. Combining mixture components for clustering.
Journal of Computational and Graphical Statistics, 19(2):332-353, 2010. doi:10.1198/jcgs.2010.08111
S. Kyoya and K. Yamanishi. Summarizing finite mixture model with overlapping quantification. Entropy, 23(11):1503, 2021. doi:10.3390/e23111503
Examples
devAskNewPage(ask = TRUE)
# Generate normal dataset.
n <- c(500, 200, 400)
Theta <- new("RNGMVNORM.Theta", c = 3, d = 2)
a.theta1(Theta, 1) <- c(3, 10)
a.theta1(Theta, 2) <- c(8, 6)
a.theta1(Theta, 3) <- c(12, 11)
a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2)
a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5)
a.theta2(Theta, 3) <- c(2, 1, 1, 2)
normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = "normal_1", n = n, Theta = a.Theta(Theta))
# Estimate number of components, component weights and component parameters.
normalest <- REBMIX(model = "REBMVNORM",
Dataset = a.Dataset(normal),
Preprocessing = "histogram",
cmax = 6,
Criterion = "BIC")
summary(normalest)
# Plot finite mixture.
plot(normalest)
# Cluster dataset.
normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest, Zt = a.Zt(normal))
# Plot clusters.
plot(normalclu)
summary(normalclu)