tidyst_kms {eks}R Documentation

Tidy and geospatial kernel mean shift clustering

Description

Tidy and geospatial versions of a kernel mean shift clustering for 1- and 2-dimensional data.

Usage

tidy_kms(data, ...)
st_kms(x, ...)

Arguments

data

data frame/tibble of data values

x

sf object with point geometry

...

other parameters in ks::kms function

Details

Mean shift clustering is a generalisation of k-means clustering (aka unsupervised learning) which allows for non-ellipsoidal clusters and does not require the number of clusters to be pre-specified. The mean shift clusters are determined by following the initial observations along the density gradient ascent paths to the cluster centre.

For details of the computation and the bandwidth selection procedure of the kernel mean shift clustering, see ?ks::kms. The bandwidth matrix of smoothing parameters is computed as in ks::kdde(deriv_order=1).

Value

The output from *_kms have the same structure as the kernel density estimate from *_kde, except that x,y indicate the data points rather than the grid points, and estimate indicates the mean shift cluster label of the data points, rather than the density values.

Examples

## tidy 2-d mean shift clustering  
data(crabs, package="MASS")
crabs2 <- dplyr::select(crabs, FL, CW)
t1 <- tidy_kms(crabs2)
## convex hulls of clusters
t2 <- dplyr::group_by(t1, estimate)
t2 <- dplyr::slice(t2, chull(FL,CW))

gt <- ggplot2::ggplot(t1, ggplot2::aes(x=FL, y=CW)) 
gt + ggplot2::geom_point(ggplot2::aes(colour=estimate)) +
    ggplot2::geom_polygon(data=t2, ggplot2::aes(fill=estimate), alpha=0.1)

## geospatial mean shift clustering 
data(wa)
data(grevilleasf)
hakeoides <- dplyr::filter(grevilleasf, species=="hakeoides")
s1 <- st_kms(hakeoides)
## convex hulls of clusters
s2 <- dplyr::group_by(s1$sf, estimate)
s2 <- dplyr::summarise(s2, geometry=sf::st_combine(geometry))
s2 <- sf::st_convex_hull(s2)

## base R plot
xlim <- c(1.2e5, 1.1e6); ylim <- c(6.1e6, 7.2e6)
plot(wa, xlim=xlim, ylim=ylim)
plot(s1, add=TRUE, pch=16, pal=function(.){
    colorspace::qualitative_hcl(n=., palette="Set2")})
plot(s2, add=TRUE, lty=3, pal=function(.){
    colorspace::qualitative_hcl(n=., palette="Set2", alpha=0.15)})

## geom_sf plot
gs <- ggplot2::ggplot(s1) + ggplot2::geom_sf(data=wa, fill=NA)  + 
    ggthemes::theme_map()
gs + ggplot2::geom_sf(data=s1$sf, ggplot2::aes(colour=estimate), alpha=0.5) + 
    ggplot2::geom_sf(data=s2, ggplot2::aes(fill=estimate), linetype="dotted", 
    alpha=0.15) + 
    colorspace::scale_colour_discrete_qualitative(palette="Set2") +
    colorspace::scale_fill_discrete_qualitative(palette="Set2") +
    ggplot2::coord_sf(xlim=xlim, ylim=ylim) 

[Package eks version 1.0.4 Index]