R: Cluster analysis via K-means algorithm

clukm {lmomRFA}

R Documentation

Cluster analysis via K-means algorithm

Description

Performs cluster analysis using the K-means algorithm.

Usage

clukm(x, assign, maxit = 10, algorithm = "Hartigan-Wong")

Arguments

`x`	A numeric matrix (or a data frame with all numeric columns, which will be coerced to a matrix). Contains the data: each row should contain the attributes for a single point.
`assign`	A vector whose distinct values indicate the initial clustering of the points.
`maxit`	Maximum number of iterations.
`algorithm`	Clustering algorithm. Permitted values are the same as for `kmeans`.

Value

An object of class kmeans. For details see the help for kmeans.

Note

clukm is a wrapper for the R function kmeans. The only difference is that in clukm the user supplies an initial assignment of sites to clusters (from which cluster centers are computed), whereas in kmeans the user supplies the initial cluster centers explicitly.

Author(s)

J. R. M. Hosking jrmhosking@gmail.com

References

Hosking, J. R. M., and Wallis, J. R. (1997). Regional frequency analysis: an approach based on L-moments. Cambridge University Press.

Examples

## Clustering of gaging stations in Appalachia, as in Hosking
## and Wallis (1997, sec. 9.2.3)
data(Appalach)
# Form attributes for clustering (Hosking and Wallis's Table 9.4)
att <- cbind(a1 = log(Appalach$area),
             a2 = sqrt(Appalach$elev),
             a3 = Appalach$lat,
             a4 = Appalach$long)
att <- apply(att, 2, function(x) x/sd(x))
att[,1] <- att[,1] * 3
# Clustering by Ward's method
(cl <- cluagg(att))
# Details of the clustering with 7 clusters
(inf <- cluinf(cl, 7))
# Refine the 7 clusters by K-means
clkm <- clukm(att, inf$assign)
# Compare the original and K-means clusters
table(Kmeans=clkm$cluster, Ward=inf$assign)
# Some details about the K-means clusters: range of area, number
# of sites, weighted average L-CV and L-skewness
bb <- by(Appalach, clkm$cluster, function(x)
  c( min.area = min(x$area),
     max.area = max(x$area),
     n = nrow(x),
     ave.t = round(weighted.mean(x$t, x$n), 3),
     ave.t_3 = round(weighted.mean(x$t_3, x$n), 3)))
# Order the clusters in increasing order of minimum area
ord <- order(sapply(bb, "[", "min.area"))
# Make the result into a data frame.  Compare with Hosking
# and Wallis (1997), Table 9.5.
do.call(rbind, bb[ord])

[Package lmomRFA version 3.6 Index]