R: Estimate the relationship between the kernel density of two...

hotspot_dual_kde {sfhotspot}

R Documentation

Estimate the relationship between the kernel density of two layers of points

Description

Estimate the relationship between the kernel density of two layers of points

Usage

hotspot_dual_kde(
  x,
  y,
  cell_size = NULL,
  grid_type = "rect",
  bandwidth = NULL,
  bandwidth_adjust = 1,
  method = "ratio",
  grid = NULL,
  weights = NULL,
  quiet = FALSE,
  ...
)

Arguments

`x`, `y`	`sf` data frames containing points.
`cell_size`	`numeric` value specifying the size of each equally spaced grid cell, using the same units (metres, degrees, etc.) as used in the `sf` data frame given in the `x` argument. Ignored if `grid` is not `NULL`. If this argument and `grid` are `NULL` (the default), the cell size will be calculated automatically (see Details).
`grid_type`	`character` specifying whether the grid should be made up of squares (`"rect"`, the default) or hexagons (`"hex"`). Ignored if `grid` is not `NULL`.
`bandwidth`	either a single `numeric` value specifying the bandwidth to be used in calculating the kernel density estimates, or a list of exactly 2 such values. If this argument is `NULL` (the default), the bandwidth for both `x` and `y` will be determined automatically using the result of `bandwidth.nrd` called on the co-ordinates of the points in `x`. If this argument is `list(NULL, NULL)`, separate bandwidths will be determined automatically for `x` and `y` based on each layer.
`bandwidth_adjust`	single positive `numeric` value by which the value of `bandwidth` for both `x` and `y` will be multiplied, or a list of two such values. Useful for setting the bandwidth relative to the default.
`method`	`character` specifying the method by which the densities, `d()`, of `x` and `y` will be related: `ratio` (the default) calculates the density of `x` divided by the density of `y`, i.e. `d(x) / d(y)`. `log` calculates the natural logarithm of the density of `x` divided by the density of `y`, i.e. `log(d(x) / d(y))`. `diff` calculates the difference between the density of `x` and the density of `y`, i.e. `d(x) - d(y)`. `sum` calculates the sum of the density of `x` and the density of `y`, i.e. `d(x) + d(y)`. The result of this calculation will be returned in the `kde` column of the return value.
`grid`	`sf` data frame containing polygons, which will be used as the grid for which densities are estimated.
`weights`	`NULL` (the default) or a vector of length two giving either `NULL` or the name of a column in each of `x` and `y` to be used as weights for weighted counts and KDE values.
`quiet`	if set to `TRUE`, messages reporting the values of any parameters set automatically will be suppressed. The default is `FALSE`.
`...`	Further arguments passed to `kde`.

Value

An sf tibble of grid cells with corresponding point counts and dual kernel density estimates for each cell. This can be plotted using autoplot.

This function creates a regular two-dimensional grid of cells (unless a custom grid is specified with grid), calculates the density of points in each cell for each of x and y using functions from the SpatialKDE package, then produces a value representing a relation between the two densities. The count of points in each cell is also returned.

Dual kernel density values can be useful for understanding the relationship between the distributions of two sets of point locations. For example:

The ratio between two densities representing the locations of burglaries and the locations of houses can show the distribution of the risk (incidence rate) of burglaries. The logged ratio may be useful to show relationships where one set of points has an extremely skewed distribution.
The difference between two densities can show the change in distributions between two points in time.
The sum of two densities can be used to estimate the total density of two types of point, e.g. the locations of occurrences of two diseases.

Coverage of the output data

The grid produced by this function covers the convex hull of the points in x. This means the result may include KDE values for cells that are outside the area for which data were provided, which could be misleading. To handle this, consider cropping the output layer to the area for which data are available. For example, if you only have crime data for a particular district, crop the output dataset to the district boundary using st_intersection.

Automatic cell-size selection

If no cell size is given then the cell size will be set so that there are 50 cells on the shorter side of the grid. If the x SF object is projected in metres or feet, the number of cells will be adjusted upwards so that the cell size is a multiple of 100.

References

Yin, P. (2020). Kernels and Density Estimation. The Geographic Information Science & Technology Body of Knowledge (1st Quarter 2020 Edition), John P. Wilson (ed.). doi:doi:10.22224/gistbok/2020.1.12

Examples

# See also the examples for `hotspot_kde()` for examples of how to specify
# `cell_size`, `bandwidth`, etc.

library(sf)

# Transform data to UTM zone 15N so that cell_size and bandwidth can be set
# in metres
memphis_robberies_utm <- st_transform(memphis_robberies, 32615)
memphis_population_utm <- st_transform(memphis_population, 32615)

# Calculate burglary risk based on residential population. `weights` is set
# to `c(NULL, population)` so that the robberies layer is not weighted and
# the population layer is weighted according to the number of residents in
# each census block.

hotspot_dual_kde(
  memphis_robberies_utm,
  memphis_population_utm,
  bandwidth = list(NULL, NULL),
  weights = c(NULL, population)
)

[Package sfhotspot version 0.8.0 Index]