greencut {greenclust} | R Documentation |
Cut a Greenclust Tree into Optimal Groups
Description
Cuts a greenclust
tree at an automatically-determined number
of groups.
Usage
greencut(g, k = NULL, h = NULL)
Arguments
g |
a tree as producted by |
k |
an integer scalar with the desired number of groups |
h |
numeric scalar with the desired height where the tree should be cut |
Details
The cut point is calculated by finding the number of groups/clusters that results in a collapsed contingency table with the most-significant (lowest p-value) chi-squared test. If there are ties, the smallest number of groups wins.
If a certain number of groups is required or a specific r-squared
(1 - height) threshold is targeted, values for either k
or h
may be provided. (While the regular cutree
function could
also be used in this circumstance, it may still be useful to have the
additional attributes that greencut()
provides.)
As with cutree()
, k
overrides h
if both are given.
Value
greencut
returns a vector of group memberships, with the
resulting r-squared value and p-value as object attributes,
accessable via attr
.
References
Greenacre, M.J. (1988) "Clustering the Rows and Columns of a Contingency Table," Journal of Classification 5, 39-51. doi:10.1007/BF01901670
See Also
greenclust
, greenplot
,
assign.cluster
Examples
# Combine Titanic passenger attributes into a single category
# and create a contingency table for the non-zero levels
tab <- t(as.data.frame(apply(Titanic, 4:1, FUN=sum)))
tab <- tab[apply(tab, 1, sum) > 0, ]
grc <- greenclust(tab)
greencut(grc)
plot(grc)
rect.hclust(grc, max(greencut(grc)),
border=unique(greencut(grc))+1)