grts {spsurvey} | R Documentation |
Select a generalized random tessellation stratified (GRTS) sample
Description
Select a spatially balanced sample from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites. For technical details, see Stevens and Olsen (2004).
Usage
grts(
sframe,
n_base,
stratum_var = NULL,
seltype = NULL,
caty_var = NULL,
caty_n = NULL,
aux_var = NULL,
legacy_var = NULL,
legacy_sites = NULL,
legacy_stratum_var = NULL,
legacy_caty_var = NULL,
legacy_aux_var = NULL,
mindis = NULL,
maxtry = 10,
n_over = NULL,
n_near = NULL,
wgt_units = NULL,
pt_density = NULL,
DesignID = "Site",
SiteBegin = 1,
sep = "-",
projcrs_check = TRUE
)
Arguments
sframe |
A sampling frame as an |
n_base |
The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by |
stratum_var |
A character string containing the name of the column from
|
seltype |
A character string or vector indicating the inclusion probability type,
which must be one of following: |
caty_var |
A character string containing the name of the column from
|
caty_n |
A character vector indicating the expected sample size for each
level of |
aux_var |
A character string containing the name of the column from
|
legacy_var |
This argument can be used instead of |
legacy_sites |
An sf object with a |
legacy_stratum_var |
A character string containing the name of the column from
|
legacy_caty_var |
A character string containing the name of the column from
|
legacy_aux_var |
A character string containing the name of the column from
|
mindis |
A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and |
maxtry |
The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are |
n_over |
The number of reverse hierarchically ordered (rho) replacement sites.
If the sampling design is unstratified, then
|
n_near |
The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified, |
wgt_units |
The units used to compute the design weights. These
units must be standard units as defined by the |
pt_density |
A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points. |
DesignID |
A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with |
SiteBegin |
A character string indicating the first number to use to match
with |
sep |
A character string that acts as a separator between
|
projcrs_check |
A check for whether the coordinates are projected. If |
Details
n_base
is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, n_base
is typically the
number of sites in all panels that will be sampled in the same temporal period –
n_base
is not the total number of sites in all panels. The sum of n_base
and
n_over
is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.
Value
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
-
sites_legacy
An sf object containing legacy sites. This isNULL
if legacy sites were not included in the sample. -
sites_base
An sf object containing the base sites. This isNULL
ifn_base
equals the number of legacy sites. -
sites_over
An sf object containing the reverse hierarchically ordered replacement sites. This isNULL
if no reverse hierarchically ordered replacement sites were included in the sample. -
sites_near
An sf object containing the nearest neighbor replacement sites. This isNULL
if no nearest neighbor replacement sites were included in the sample. -
design
A list documenting the specifications of this sampling design. This can be checked to verify your sampling design ran as intended.-
call
The original function call. -
stratum_var
The name of the stratification variable insframe
. This equalsNULL
if no stratification is used. -
stratum
The unique strata. This equals"None"
if the sampling design is unstratified. -
n_base
The base sample size per stratum. -
seltype
The selection type per stratum. -
caty_var
The name of the unequal probability variable insframe
. This equalsNULL
if no unequal probability variable is used. -
caty_n
The expected sample sizes for each level of the unequal probability grouping variable per stratum. This equalsNULL
whenseltype
is not"unequal"
. -
aux_var
The name of the proportional probability (auxiliary) variable insframe
. This equalsNULL
if no proportional probability variable is used. -
legacy
A logical variable indicating whether legacy sites were included in the sample. -
legacy_stratum_var
The name of the stratification variable inlegacy_sites
. Omitted if legacy sites are not used. This equalsNULL
if legacy sites were used but no stratification variable is used. -
legacy_caty_var
The name of the unequal probability variable inlegacy_sites
. Omitted if legacy sites are not used. This equalsNULL
if legacy sites were used but no unequal probability variable is used. -
legacy_aux_var
The name of the proportional probability (auxiliary) variable inlegacy_sites
. Omitted if legacy sites are not used. This equalsNULL
if legacy sites were used but no proportional probability variable is used. -
mindis
The minimum distance requirement desired. This isNULL
when no minimum distance requirement was applied. -
n_over
The reverse hierarchically ordered replacement site sample sizes per stratum. Ifseltype
isunequal
, this represents the expected sample sizes. This isNULL
when no reverse hierarchically ordered replacement sites were selected. -
n_near
The number of nearest neighbor replacement sites desired. This isNULL
when no nearest neighbor replacement sites were selected.
-
When non-NULL
, the sites_legacy
, sites_base
,
sites_over
, and sites_near
objects contain the original columns
in sframe
and include a few additional columns. These additional columns
are
-
siteID
A site identifier (as named using theDesignID
andSiteBegin
arguments togrts()
). -
siteuse
Whether the site is a legacy site (Legacy
), base site (Base
), reverse hierarchically ordered replacement site (Over
), or nearest neighbor replacement site (Near
). -
replsite
The replacement site ordering.replsite
isNone
if the site is not a replacement site,Next
if it is the next reverse hierarchically ordered replacement site to use, orNear_
, where the word following_
indicates the ordering of sites closest to the originally sampled site. -
lon_WGS84
Longitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected. -
lat_WGS84
Latitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected. -
X
Longitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA). -
Y
Latitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA). -
stratum
A stratum indicator.stratum
isNone
if the sampling design was unstratified. If the sampling design wasstratified
,stratum
indicates the stratum. -
wgt
The design weight. -
ip
The site's original inclusion probability (the reciprocal) of (wgt
). -
caty
An unequal probability grouping indicator.caty
isNone
if the sampling design did not use unequal inclusion probabilities. If the sampling design did use unequal inclusion probabilities,caty
indicates the unequal probability level. -
aux
The auxiliary proportional probability variable. This column is only returned ifseltype
wasproportional
in the original sampling design.
If any columns in sframe
contain these names, those columns
from sframe
will be automatically prefixed with sframe_
in the sites
object. When output is printed, a summary of site counts by
the levels in stratum_var
and caty_var
is shown.
Author(s)
Tony Olsen olsen.tony@epa.gov
References
Stevens Jr., Don L. and Olsen, Anthony R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465), 262-278.
See Also
irs
to select a sample that is not spatially balanced
Examples
## Not run:
samp <- grts(NE_Lakes, n_base = 100)
print(samp)
strata_n <- c(low = 25, high = 30)
samp_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
print(samp_strat)
samp_over <- grts(NE_Lakes, n_base = 30, n_over = 5)
print(samp_over)
## End(Not run)