clustering_partition {bigDM}R Documentation

Obtain a partition of the spatial domain using the density-based spatial clustering (DBSC) algorithm described in Santafé et al. (2021)

Description

The function takes an object of class SpatialPolygonsDataFrame or sf and defines a spatial partition using the DBSC algorithm described in Santafé et al. (2021).

Usage

clustering_partition(
  carto,
  ID.area = NULL,
  var = NULL,
  n.cluster = 10,
  min.size = NULL,
  W = NULL,
  l = 1,
  Wk = NULL,
  distance = "euclidean",
  verbose = TRUE
)

Arguments

carto

object of class SpatialPolygonsDataFrame or sf.

ID.area

character; name of the variable that contains the IDs of spatial areal units.

var

character; name of the variable that contains the data of interest to compute spatial clusters, usually the vector of log-SMR.

n.cluster

numeric; value to fix the number of cluster centers in the DBSC algorithm. Default to 10.

min.size

numeric (default NULL); value to fix the minimum size of areas in each spatial partition.

W

optional argument with the binary adjacency matrix of the spatial areal units. If NULL (default), this object is computed from the carto argument (two areas are considered as neighbours if they share a common border).

l

numeric value with the neighbourhood order used to assign areas to each cluster. If k=1 (default), only areas that share a common border are considered.

Wk

previously computed binary adjacency matrix of l-order neighbours. If this argument is included (default NULL), the parameter l is ignored.

distance

the distance measure to be used (default "euclidean"). See the method argument of dist function for other options.

verbose

logical value (default TRUE); indicates if the function runs in verbose mode.

Details

The DBSC algorithm implemented in this function is a new spatial clustering algorithm based on the density clustering algorithm introduced by Rodriguez and Laio (2014) and the posterior modification presented by Wang and Song (2016). This algorithm is able to obtain a single clustering partition of the data by automatically detecting clustering centers and assigning each area to its nearest cluster centroid. The algorithm has its basis in the assumption that cluster centers are points with high local density and relatively large distance to other points with higher local densities. See Santafé et al. (2021) for more details.

Value

sf object with the original data and a grouping variable named 'ID.group'.

References

Rodriguez A, Laio A (2014). “Clustering by fast search and find of density peaks.” Science, 344(6191), 1492–1496. doi:10.1126/science.1242072.

Santafé G, Adin A, Lee D, Ugarte MD (2021). “Dealing with risk discontinuities to estimate cancer mortality risks when the number of small areas is large.” Statistical Methods in Medical Research, 30(1), 6–21. doi:10.1177/0962280220946502.

Wang G, Song Q (2016). “Automatic clustering via outward statistical testing on density metrics.” IEEE Transactions on Knowledge and Data Engineering, 28(8), 1971–1985. doi:10.1109/TKDE.2016.2535209.

Examples

## Not run: 
library(sf)
library(tmap)

## Load the Spain colorectal cancer mortality data ##
data(Carto_SpainMUN)

## Define a spatial partition using the DBSC algorithm ##
Carto_SpainMUN$logSMR <- log(Carto_SpainMUN$obs/Carto_SpainMUN$exp+0.0001)

carto.new <- clustering_partition(carto=Carto_SpainMUN, ID.area="ID", var="logSMR",
                                  n.cluster=20, l=2, min.size=100, verbose=TRUE)
table(carto.new$ID.group)

## Plot of the grouping variable 'ID.group' ##
carto.data <- st_set_geometry(carto.new, NULL)
carto.partition <- aggregate(carto.new[,"geometry"], list(ID.group=carto.data[,"ID.group"]), head)

tm_shape(carto.new) +
        tm_polygons(col="ID.group") +
        tm_shape(carto.partition) +
        tm_borders(col="black", lwd=2) +
        tm_layout(legend.outside=TRUE)

## End(Not run)


[Package bigDM version 0.5.3 Index]