clustering_partition {bigDM} | R Documentation |
Obtain a partition of the spatial domain using the density-based spatial clustering (DBSC) algorithm described in Santafé et al. (2021)
Description
The function takes an object of class SpatialPolygonsDataFrame
or sf
and defines a spatial partition using the DBSC algorithm described in Santafé et al. (2021).
Usage
clustering_partition(
carto,
ID.area = NULL,
var = NULL,
n.cluster = 10,
min.size = NULL,
W = NULL,
l = 1,
Wk = NULL,
distance = "euclidean",
verbose = TRUE
)
Arguments
carto |
object of class |
ID.area |
character; name of the variable that contains the IDs of spatial areal units. |
var |
character; name of the variable that contains the data of interest to compute spatial clusters, usually the vector of log-SMR. |
n.cluster |
numeric; value to fix the number of cluster centers in the DBSC algorithm. Default to 10. |
min.size |
numeric (default |
W |
optional argument with the binary adjacency matrix of the spatial areal units. If |
l |
numeric value with the neighbourhood order used to assign areas to each cluster. If |
Wk |
previously computed binary adjacency matrix of l-order neighbours. If this argument is included (default |
distance |
the distance measure to be used (default |
verbose |
logical value (default |
Details
The DBSC algorithm implemented in this function is a new spatial clustering algorithm based on the density clustering algorithm introduced by Rodriguez and Laio (2014) and the posterior modification presented by Wang and Song (2016). This algorithm is able to obtain a single clustering partition of the data by automatically detecting clustering centers and assigning each area to its nearest cluster centroid. The algorithm has its basis in the assumption that cluster centers are points with high local density and relatively large distance to other points with higher local densities. See Santafé et al. (2021) for more details.
Value
sf
object with the original data and a grouping variable named 'ID.group'.
References
Rodriguez A, Laio A (2014). “Clustering by fast search and find of density peaks.” Science, 344(6191), 1492–1496. doi:10.1126/science.1242072.
Santafé G, Adin A, Lee D, Ugarte MD (2021). “Dealing with risk discontinuities to estimate cancer mortality risks when the number of small areas is large.” Statistical Methods in Medical Research, 30(1), 6–21. doi:10.1177/0962280220946502.
Wang G, Song Q (2016). “Automatic clustering via outward statistical testing on density metrics.” IEEE Transactions on Knowledge and Data Engineering, 28(8), 1971–1985. doi:10.1109/TKDE.2016.2535209.
Examples
## Not run:
library(sf)
library(tmap)
## Load the Spain colorectal cancer mortality data ##
data(Carto_SpainMUN)
## Define a spatial partition using the DBSC algorithm ##
Carto_SpainMUN$logSMR <- log(Carto_SpainMUN$obs/Carto_SpainMUN$exp+0.0001)
carto.new <- clustering_partition(carto=Carto_SpainMUN, ID.area="ID", var="logSMR",
n.cluster=20, l=2, min.size=100, verbose=TRUE)
table(carto.new$ID.group)
## Plot of the grouping variable 'ID.group' ##
carto.data <- st_set_geometry(carto.new, NULL)
carto.partition <- aggregate(carto.new[,"geometry"], list(ID.group=carto.data[,"ID.group"]), head)
tm_shape(carto.new) +
tm_polygons(col="ID.group") +
tm_shape(carto.partition) +
tm_borders(col="black", lwd=2) +
tm_layout(legend.outside=TRUE)
## End(Not run)