cookies {divvy} | R Documentation |
Rarefy localities within circular regions of standard area
Description
Spatially subsample a dataset to produce samples of standard area and extent.
Usage
cookies(
dat,
xy,
iter,
nSite,
r,
weight = FALSE,
crs = "epsg:4326",
output = "locs"
)
Arguments
dat |
A |
xy |
A vector of two elements, specifying the name or numeric position
of columns in |
iter |
The number of spatial subsamples to return |
nSite |
The quota of unique locations to include in each subsample. |
r |
Numeric value for the radius (km) defining the circular extent of each spatial subsample. |
weight |
Whether sites within the subsample radius should be drawn
at random ( |
crs |
Coordinate reference system as a GDAL text string, EPSG code,
or object of class |
output |
Whether the returned data should be two columns of
subsample site coordinates ( |
Details
The function takes a single location as a starting (seed) point and
circumscribes a buffer of r
km around it. Buffer circles that span
the antemeridian (180 degrees longitude) are wrapped as a multipolygon
to prevent artificial truncation. After standardising radial extent, sites
are drawn within the circular extent until a quota of nSite
is met.
Sites are sampled without replacement, so a location is used as a seed point
only if it is within r
km distance of at least nSite-1
locations.
The method is introduced in Antell et al. (2020) and described in
detail in Methods S1 therein.
The probability of drawing each site within the standardised extent is
either equal (weight = FALSE
) or proportional to the inverse-square
of its distance from the seed point (weight = TRUE
), which clusters
subsample locations more tightly.
For geodetic coordinates (latitude-longitude), distances are calculated along great circle arcs. For Cartesian coordinates, distances are calculated in Euclidian space, in units associated with the projection CRS (e.g. metres).
Value
A list of length iter
. Each list element is a
data.frame
or matrix
(matching the class of dat
)
with nSite
observations. If output = 'locs'
(default), only the coordinates of subsampling locations are returned.
If output = 'full'
, all dat
columns are returned for the
rows associated with the subsampled locations.
If weight = TRUE
, the first observation in each returned subsample
data.frame
corresponds to the seed point. If weight = FALSE
,
observations are listed in the random order of which they were drawn.
References
Antell GT, Kiessling W, Aberhan M, Saupe EE (2020). “Marine biodiversity and geographic distributions are independent on large scales.” Current Biology, 30(1), 115-121. doi:10.1016/j.cub.2019.10.065.
See Also
Examples
# generate occurrences: 10 lat-long points in modern Australia
n <- 10
x <- seq(from = 140, to = 145, length.out = n)
y <- seq(from = -20, to = -25, length.out = n)
pts <- data.frame(x, y)
# sample 5 sets of 3 occurrences within 200km radius
cookies(dat = pts, xy = 1:2, iter = 5,
nSite = 3, r = 200)