tidyst_kms {eks} | R Documentation |
Tidy and geospatial kernel mean shift clustering
Description
Tidy and geospatial versions of a kernel mean shift clustering for 1- and 2-dimensional data.
Usage
tidy_kms(data, ...)
st_kms(x, ...)
Arguments
data |
data frame/tibble of data values |
x |
sf object with point geometry |
... |
other parameters in |
Details
Mean shift clustering is a generalisation of k
-means clustering (aka unsupervised learning) which allows for non-ellipsoidal clusters and does not require the number of clusters to be pre-specified. The mean shift clusters are determined by following the initial observations along the density gradient ascent paths to the cluster centre.
For details of the computation and the bandwidth selection procedure of the kernel mean shift clustering, see ?ks::kms
. The bandwidth matrix of smoothing parameters is computed as in ks::kdde(deriv_order=1)
.
Value
The output from *_kms
have the same structure as the kernel density estimate from *_kde
, except that x,y
indicate the data points rather than the grid points, and estimate
indicates the mean shift cluster label of the data points, rather than the density values.
Examples
## tidy 2-d mean shift clustering
library(ggplot2)
data(crabs, package="MASS")
crabs2 <- dplyr::select(crabs, FL, CW)
t1 <- tidy_kms(crabs2)
## convex hulls of clusters
t2 <- dplyr::group_by(t1, estimate)
t2 <- dplyr::slice(t2, chull(FL,CW))
gt <- ggplot(t1, aes(x=FL, y=CW))
gt + geom_point(aes(colour=estimate)) +
geom_polygon(data=t2, aes(fill=estimate), alpha=0.1)
## geospatial mean shift clustering
data(wa)
data(grevilleasf)
hakeoides <- dplyr::filter(grevilleasf, species=="hakeoides")
s1 <- st_kms(hakeoides)
## convex hulls of clusters
s2 <- dplyr::group_by(s1$sf, estimate)
s2 <- dplyr::summarise(s2, geometry=sf::st_combine(geometry))
s2 <- sf::st_convex_hull(s2)
## base R plot
xlim <- c(1.2e5, 1.1e6); ylim <- c(6.1e6, 7.2e6)
plot(wa, xlim=xlim, ylim=ylim)
plot(s1, add=TRUE, pch=16, pal=function(.){
colorspace::qualitative_hcl(n=., palette="Set2")})
plot(s2, add=TRUE, lty=3, pal=function(.){
colorspace::qualitative_hcl(n=., palette="Set2", alpha=0.15)})
## geom_sf plot
gs <- ggplot(s1) + geom_sf(data=wa, fill=NA) + ggthemes::theme_map()
gs + geom_sf(data=s1$sf, aes(colour=estimate), alpha=0.5) +
geom_sf(data=s2, aes(fill=estimate), linetype="dotted", alpha=0.15) +
colorspace::scale_colour_discrete_qualitative(palette="Set2") +
colorspace::scale_fill_discrete_qualitative(palette="Set2") +
coord_sf(xlim=xlim, ylim=ylim)