| grts {spsurvey} | R Documentation |
Select a generalized random tessellation stratified (GRTS) sample
Description
Select a spatially balanced sample from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites. For technical details, see Stevens and Olsen (2004).
Usage
grts(
sframe,
n_base,
stratum_var = NULL,
seltype = NULL,
caty_var = NULL,
caty_n = NULL,
aux_var = NULL,
legacy_var = NULL,
legacy_sites = NULL,
legacy_stratum_var = NULL,
legacy_caty_var = NULL,
legacy_aux_var = NULL,
mindis = NULL,
maxtry = 10,
n_over = NULL,
n_near = NULL,
wgt_units = NULL,
pt_density = NULL,
DesignID = "Site",
SiteBegin = 1,
sep = "-",
projcrs_check = TRUE
)
Arguments
sframe |
A sampling frame as an |
n_base |
The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by |
stratum_var |
A character string containing the name of the column from
|
seltype |
A character string or vector indicating the inclusion probability type,
which must be one of following: |
caty_var |
A character string containing the name of the column from
|
caty_n |
A character vector indicating the expected sample size for each
level of |
aux_var |
A character string containing the name of the column from
|
legacy_var |
This argument can be used instead of |
legacy_sites |
An sf object with a |
legacy_stratum_var |
A character string containing the name of the column from
|
legacy_caty_var |
A character string containing the name of the column from
|
legacy_aux_var |
A character string containing the name of the column from
|
mindis |
A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and |
maxtry |
The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are |
n_over |
The number of reverse hierarchically ordered (rho) replacement sites.
If the sampling design is unstratified, then
|
n_near |
The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified, |
wgt_units |
The units used to compute the design weights. These
units must be standard units as defined by the |
pt_density |
A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points. |
DesignID |
A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with |
SiteBegin |
A character string indicating the first number to use to match
with |
sep |
A character string that acts as a separator between
|
projcrs_check |
A check for whether the coordinates are projected. If |
Details
n_base is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, n_base is typically the
number of sites in all panels that will be sampled in the same temporal period –
n_base is not the total number of sites in all panels. The sum of n_base and
n_over is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.
Value
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
-
sites_legacyAn sf object containing legacy sites. This isNULLif legacy sites were not included in the sample. -
sites_baseAn sf object containing the base sites. This isNULLifn_baseequals the number of legacy sites. -
sites_overAn sf object containing the reverse hierarchically ordered replacement sites. This isNULLif no reverse hierarchically ordered replacement sites were included in the sample. -
sites_nearAn sf object containing the nearest neighbor replacement sites. This isNULLif no nearest neighbor replacement sites were included in the sample. -
designA list documenting the specifications of this sampling design. This can be checked to verify your sampling design ran as intended.-
callThe original function call. -
stratum_varThe name of the stratification variable insframe. This equalsNULLif no stratification is used. -
stratumThe unique strata. This equals"None"if the sampling design is unstratified. -
n_baseThe base sample size per stratum. -
seltypeThe selection type per stratum. -
caty_varThe name of the unequal probability variable insframe. This equalsNULLif no unequal probability variable is used. -
caty_nThe expected sample sizes for each level of the unequal probability grouping variable per stratum. This equalsNULLwhenseltypeis not"unequal". -
aux_varThe name of the proportional probability (auxiliary) variable insframe. This equalsNULLif no proportional probability variable is used. -
legacyA logical variable indicating whether legacy sites were included in the sample. -
legacy_stratum_varThe name of the stratification variable inlegacy_sites. Omitted if legacy sites are not used. This equalsNULLif legacy sites were used but no stratification variable is used. -
legacy_caty_varThe name of the unequal probability variable inlegacy_sites. Omitted if legacy sites are not used. This equalsNULLif legacy sites were used but no unequal probability variable is used. -
legacy_aux_varThe name of the proportional probability (auxiliary) variable inlegacy_sites. Omitted if legacy sites are not used. This equalsNULLif legacy sites were used but no proportional probability variable is used. -
mindisThe minimum distance requirement desired. This isNULLwhen no minimum distance requirement was applied. -
n_overThe reverse hierarchically ordered replacement site sample sizes per stratum. Ifseltypeisunequal, this represents the expected sample sizes. This isNULLwhen no reverse hierarchically ordered replacement sites were selected. -
n_nearThe number of nearest neighbor replacement sites desired. This isNULLwhen no nearest neighbor replacement sites were selected.
-
When non-NULL, the sites_legacy, sites_base,
sites_over, and sites_near objects contain the original columns
in sframe and include a few additional columns. These additional columns
are
-
siteIDA site identifier (as named using theDesignIDandSiteBeginarguments togrts()). -
siteuseWhether the site is a legacy site (Legacy), base site (Base), reverse hierarchically ordered replacement site (Over), or nearest neighbor replacement site (Near). -
replsiteThe replacement site ordering.replsiteisNoneif the site is not a replacement site,Nextif it is the next reverse hierarchically ordered replacement site to use, orNear_, where the word following_indicates the ordering of sites closest to the originally sampled site. -
lon_WGS84Longitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected. -
lat_WGS84Latitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected. -
XLongitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA). -
YLatitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA). -
stratumA stratum indicator.stratumisNoneif the sampling design was unstratified. If the sampling design wasstratified,stratumindicates the stratum. -
wgtThe design weight. -
ipThe site's original inclusion probability (the reciprocal) of (wgt). -
catyAn unequal probability grouping indicator.catyisNoneif the sampling design did not use unequal inclusion probabilities. If the sampling design did use unequal inclusion probabilities,catyindicates the unequal probability level. -
auxThe auxiliary proportional probability variable. This column is only returned ifseltypewasproportionalin the original sampling design.
If any columns in sframe contain these names, those columns
from sframe will be automatically prefixed with sframe_
in the sites object. When output is printed, a summary of site counts by
the levels in stratum_var and caty_var is shown.
Author(s)
Tony Olsen olsen.tony@epa.gov
References
Stevens Jr., Don L. and Olsen, Anthony R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465), 262-278.
See Also
irsto select a sample that is not spatially balanced
Examples
## Not run:
samp <- grts(NE_Lakes, n_base = 100)
print(samp)
strata_n <- c(low = 25, high = 30)
samp_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
print(samp_strat)
samp_over <- grts(NE_Lakes, n_base = 30, n_over = 5)
print(samp_over)
## End(Not run)