sampcont {MapGAM} | R Documentation |
Unmatched Control Sampling
Description
Take all cases and a random sample of controls from a data frame. Simple random sampling and spatially stratified random sampling are available. For spatially statified random sampling, strata can be defined by region, or by region and additional stratification variables (see Tang et al., 2023 for examples and simulation comparisons). If no specific regions are specified with stratified sampling, the function will create a regular grid for spactially stratified sampling.
Usage
sampcont(rdata, type = "stratified", casecol=1, Xcol=2, Ycol=3, regions = NULL,
addstrat = NULL, times = NULL, n = 1, nrow = 100, ncol = 100)
Arguments
rdata |
a data frame with case status in the |
casecol |
the column number in |
Xcol |
the column number in |
Ycol |
the column number in |
type |
|
regions |
a vector of length equal to the number of rows in |
addstrat |
a vector of length equal to the number of rows in |
times |
included for backward compatibility; now replaced by the |
n |
the number of controls to sample from the eligible controls in each stratum. All available controls will be taken for strata with fewer than |
nrow |
the number of rows used to create a regular grid for sampling regions. Only used when |
ncol |
the number of columns used to create a regular grid for sampling regions. Only used when |
Value
rdata |
a data frame with all cases and a random sample of controls. |
w |
the inverse probability weights for the rows in |
ncont |
the total number of controls in the sample. |
type |
statified or simple sampling, as specified by the same argument described above. |
gridsize |
a vector with the numbers of rows and columns for the stratified sampling grid. |
grid |
the stratified sampling grid in PolySet format. |
Author(s)
Scott M. Bartell and Ian W. Tang sbartell@uci.edu.
References
Tang IW, Bartell SM, Vieira VM. doi:10.1016/j.sste.2023.100584Unmatched Spatially Stratified Controls: A simulation study examining efficiency and precision using spatially-diverse controls and generalized additive models. Spatial and Spatio-temporal Epidemiology 2023, 45:100584.
See Also
Examples
#### load beertweets data, which has 719 cases and 9281 controls
data(beertweets)
# take a simple random sample of 1000 controls
samp1 <- sampcont(beertweets, type="simple", n=1000)
# take a stratified random sample of controls on a 80x50 grid
samp2 <- NULL
samp2 <- sampcont(beertweets, nrow=80, ncol=50)
# Compare locations for the two sampling designs (cases in red)
par(mfrow=c(2,1), mar=c(0,3,4,3))
plot(samp1$rdata$longitude, samp1$rdata$latitude, col=3-samp1$rdata$beer,
cex=0.5, type="p", axes=FALSE, ann=FALSE)
# Show US base map if maps package is available
mapUS <- require(maps)
if (mapUS) map("state", add=TRUE)
title("Simple Random Sample, 1000 Controls")
if (!is.null(samp2)) {
plot(samp2$rdata$longitude, samp2$rdata$latitude,
col=3-samp2$rdata$beer, cex=0.5, type="p", axes=FALSE,
ann=FALSE)
if (mapUS) map("state", add=TRUE)
title(paste("Spatially Stratified Sample,",samp2$ncont,"Controls"))
}
par(mfrow=c(1,1))
## Note that weights are needed in statistical analyses
# Prevalence of cases in sample--not in source data
mean(samp1$rdata$beer)
# Estimated prevalence of cases in source data
weighted.mean(samp1$rdata$beer, w=samp1$w)
## Do beer tweet odds differ below the 36.5 degree parallel?
# Using full data
glm(beer~I(latitude<36.5), family=binomial, data=beertweets)
# Stratified sample requires sampling weights
if (!is.null(samp2)) glm(beer~I(latitude<36.5), family=binomial,
data=samp2$rdata, weights=samp2$w)