R: Identify significant spatial clusters of points

hotspot_gistar {sfhotspot}

R Documentation

Identify significant spatial clusters of points

Description

Identify hotspot and coldspot locations, that is cells in a regular grid in which there are more/fewer points than would be expected if the points were distributed randomly.

Usage

hotspot_gistar(
  data,
  cell_size = NULL,
  grid_type = "rect",
  kde = TRUE,
  bandwidth = NULL,
  bandwidth_adjust = 1,
  grid = NULL,
  weights = NULL,
  nb_dist = NULL,
  include_self = TRUE,
  p_adjust_method = NULL,
  quiet = FALSE,
  ...
)

Arguments

`data`	`sf` data frame containing points.
`cell_size`	`numeric` value specifying the size of each equally spaced grid cell, using the same units (metres, degrees, etc.) as used in the `sf` data frame given in the `data` argument. Ignored if `grid` is not `NULL`. If this argument and `grid` are `NULL` (the default), the cell size will be calculated automatically (see Details).
`grid_type`	`character` specifying whether the grid should be made up of squares (`"rect"`, the default) or hexagons (`"hex"`). Ignored if `grid` is not `NULL`.
`kde`	`TRUE` (the default) or `FALSE` indicating whether kernel density estimates (KDE) should be produced for each grid cell.
`bandwidth`	`numeric` value specifying the bandwidth to be used in calculating the kernel density estimates. If this argument is `NULL` (the default), the bandwidth will be specified automatically using the mean result of `bandwidth.nrd` called on the `x` and `y` co-ordinates separately.
`bandwidth_adjust`	single positive `numeric` value by which the value of `bandwidth` is multiplied. Useful for setting the bandwidth relative to the default.
`grid`	`sf` data frame containing polygons, which will be used as the grid for which counts are made.
`weights`	`NULL` or the name of a column in `data` to be used as weights for weighted counts and KDE values.
`nb_dist`	The distance around a cell that contains the neighbours of that cell, which are used in calculating the statistic. If this argument is `NULL` (the default), `nb_dist` is set as `cell_size * sqrt(2)` so that only the cells immediately adjacent to each cell are treated as being its neighbours.
`include_self`	Should points in a given cell be counted as well as counts in neighbouring cells when calculating the values of G_i^* (if `include_self = TRUE`, the default) or G_i^* (if `include_self = FALSE`) values? You are unlikely to want to change the default value.
`p_adjust_method`	The method to be used to adjust p-values for multiple comparisons. `NULL` (the default) uses the default method used by `p.adjust`, but any of the character values in `stats::p.adjust.methods` may be specified.
`quiet`	if set to `TRUE`, messages reporting the values of any parameters set automatically will be suppressed. The default is `FALSE`.
`...`	Further arguments passed to `kde` or ignored if `kde = FALSE`.

Details

This function calculates the Getis-Ord G_i^* (gi-star) or G_i^* Z-score statistic for identifying clusters of point locations. The underlying implementation uses the localG function to calculate the Z scores and then p.adjustSP function to adjust the corresponding p-values for multiple comparison. The function also returns counts of points in each cell and (by default but optionally) kernel density estimates using the kde function.

Coverage of the output data

The grid produced by this function covers the convex hull of the input data layer. This means the result may include G_i^* or G_i^* values for cells that are outside the area for which data were provided, which could be misleading. To handle this, consider cropping the output layer to the area for which data are available. For example, if you only have crime data for a particular district, crop the output dataset to the district boundary using st_intersection.

Automatic cell-size selection

If no cell size is given then the cell size will be set so that there are 50 cells on the shorter side of the grid. If the data SF object is projected in metres or feet, the number of cells will be adjusted upwards so that the cell size is a multiple of 100.

Value

An sf tibble of regular grid cells with corresponding point counts, G_i^* or G_i^* values and (optionally) kernel density estimates for each cell. Values greater than zero indicate more points than would be expected for randomly distributed points and values less than zero indicate fewer points. Critical values of G_i^* and G_i^* are given in the manual page for localG.

The output from this function can be plotted in the same way as for other SF objects, for which see vignette("sf5", package = "sf").

References

Getis, A. & Ord, J. K. (1992). The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis, 24(3), 189-206. doi:doi:10.1111/j.1538-4632.1992.tb00261.x

Examples

library(sf)

# Transform data to UTM zone 15N so that cell_size and bandwidth can be set
# in metres
memphis_robberies_utm <- st_transform(memphis_robberies_jan, 32615)

# Automatically set grid-cell size, bandwidth and neighbour distance

hotspot_gistar(memphis_robberies_utm)


# Manually set grid-cell size in metres, since the `memphis_robberies`
# dataset uses a co-ordinate reference system (UTM zone 15 north) that is
# specified in metres

hotspot_gistar(memphis_robberies_utm, cell_size = 200)


# Automatically set grid-cell size and bandwidth for lon/lat data, since it
# is not intuitive to set these values manually in decimal degrees. To do
# this it is necessary to not calculate KDEs due to a limitation in the
# underlying function.

hotspot_gistar(memphis_robberies, kde = FALSE)

[Package sfhotspot version 0.8.0 Index]