MonoClust.object {monoClust} | R Documentation |
Monothetic Clustering Tree Object
Description
The structure and objects contained in MonoClust, an object returned from
the MonoClust()
function and used as the input in other functions in the
package.
Value
- frame
Data frame in the form of a
tibble::tibble()
representing a tree structure with one row for each node. The columns include:- number
Index of the node. Depth of a node can be derived by
number %/% 2
.- var
Name of the variable used in the split at a node or
"<leaf>"
if it is a leaf node.- cut
Splitting value, so values of
var
that are smaller than that go to left branch while values greater than that go to the right branch.- n
Cluster size, the number of observations in that cluster.
- inertia
Inertia value of the cluster at that node.
- bipartsplitrow
Position of the next split row in the data set (that position will belong to left node (smaller)).
- bipartsplitcol
Position of the next split variable in the data set.
- inertiadel
Proportion of inertia value of the cluster at that node to the inertia of the root.
- medoid
Position of the data point regarded as the medoid of its cluster.
- loc
y-coordinate of the splitting node to facilitate showing on the tree. See
plot.MonoClust()
for details.- split.order
Order of the splits with root is 0.
- inertia_explained
Percent inertia explained as described in Chavent (2007). It is
1 - (sum(current inertia)/inertial[1])
.- alt
A nested tibble of alternate splits at a node. It contains
bipartsplitrow
andbipartsplitcol
with the same meaning above. Note that this is only for information purpose. CurrentlymonoClust
does not support choosing an alternate splitting route. RunningMonoClust()
withnclusters = 2
step-by-step can be run if needed.
- membership
Vector of the same length as the number of rows in the data, containing the value of
frame$number
corresponding to the leaf node that an observation falls into.- dist
Distance matrix calculated using the method indicated in
distmethod
argument ofMonoClust()
.- terms
Vector of variable names in the data that were used to split.
- centroids
Data frame with one row for centroid value of each cluster.
- medoids
Named vector of positions of the data points regarded as medoids of clusters.
- alt
Indicator of having an alternate splitting route occurred when splitting.
- circularroot
List of values designed for circular variable in the data set.
var
is the name of circular variable andcut
is its first best split value. If circular variable is not available, both objects are NULL.
References
Chavent, M., Lechevallier, Y., & Briant, O. (2007). DIVCLUS-T: A monothetic divisive hierarchical clustering method. Computational Statistics & Data Analysis, 52(2), 687-701. doi: 10.1016/j.csda.2007.03.013.