greenclust {greenclust} | R Documentation |
Row Clustering Using Greenacre's Method
Description
Iteratively collapses the rows of a table (typically a contingency table) by selecting the pair of rows each time whose combination creates the smalled loss of chi-squared.
Usage
greenclust(x, correct = FALSE, verbose = FALSE)
Arguments
x |
a numeric matrix or data frame |
correct |
a logical indicating whether to apply a continuity correction if and when the clustered table reaches a 2x2 dimension. |
verbose |
if TRUE, prints the clustered table along with r-squared and p-value at each step |
Value
An object of class greenclust
which is compatible with most
hclust
object functions, such as plot()
and
rect.hclust()
. The height vector represents the proportion
of chi-squared, relative to the original table, seen at each clustering
step. The greenclust object also includes a vector for the chi-squared
test p-value at each step and a boolean vector indicating whether the
step had a tie for "winner".
References
Greenacre, M.J. (1988) "Clustering the Rows and Columns of a Contingency Table," Journal of Classification 5, 39-51. doi:10.1007/BF01901670
See Also
greencut
, greenplot
,
assign.cluster
Examples
# Combine Titanic passenger attributes into a single category
tab <- t(as.data.frame(apply(Titanic, 4:1, FUN=sum)))
# Remove rows with all zeros
tab <- tab[apply(tab, 1, sum) > 0, ]
# Perform clustering on contingency table
grc <- greenclust(tab)
# Plot r-squared and p-values for each potential cut point
greenplot(grc)
# Get clusters at suggested cut point
clusters <- greencut(grc)
# Plot dendrogram with clusters marked
plot(grc)
rect.hclust(grc, max(clusters))