as_bootstrap_design {svrep} | R Documentation |
Convert a survey design object to a bootstrap replicate design
Description
Converts a survey design object to a replicate design object with replicate weights formed using a bootstrap method. Supports stratified, cluster samples with one or more stages of sampling. At each stage of sampling, either simple random sampling (with or without replacement) or unequal probability sampling (with or without replacement) may be used.
Usage
as_bootstrap_design(
design,
type = "Rao-Wu-Yue-Beaumont",
replicates = 500,
compress = TRUE,
mse = getOption("survey.replicates.mse"),
samp_method_by_stage = NULL
)
Arguments
design |
A survey design object created using the 'survey' (or 'srvyr') package,
with class |
type |
The type of bootstrap to use, which should be chosen based
on its applicability to the sampling method used for the survey.
The available types are the following:
|
replicates |
Number of bootstrap replicates (should be as large as possible, given computer memory/storage limitations). A commonly-recommended default is 500. |
compress |
Use a compressed representation of the replicate weights matrix. This reduces the computer memory required to represent the replicate weights and has no impact on estimates. |
mse |
If |
samp_method_by_stage |
(Optional). By default, this function will automatically determine the sampling method used at each stage.
However, this argument can be used to ensure the correct sampling method is identified for each stage.
|
Value
A replicate design object, with class svyrep.design
, which can be used with the usual functions,
such as svymean()
or svyglm()
.
Use weights(..., type = 'analysis')
to extract the matrix of replicate weights.
Use as_data_frame_with_weights()
to convert the design object to a data frame with columns
for the full-sample and replicate weights.
References
Beaumont, J.-F.; Émond, N. (2022). "A Bootstrap Variance Estimation Method for Multistage Sampling and Two-Phase Sampling When Poisson Sampling Is Used at the Second Phase." Stats, 5: 339–357. https://doi.org/10.3390/stats5020019
Canty, A.J.; Davison, A.C. (1999). "Resampling-based variance estimation for labour force surveys." The Statistician, 48: 379-391.
Preston, J. (2009). "Rescaled bootstrap for stratified multistage sampling." Survey Methodology, 35(2): 227-234.
Rao, J.N.K.; Wu, C.F.J.; Yue, K. (1992). "Some recent work on resampling methods for complex surveys." Survey Methodology, 18: 209–217.
See Also
Use estimate_boot_reps_for_target_cv
to help choose the number of bootstrap replicates.
The underlying function for the Rao-Wu-Yue-Beaumont bootstrap
is make_rwyb_bootstrap_weights
.
Other bootstrap methods are implemented using functions from the 'survey' package,
including: bootweights
(Canty-Davison),
subbootweights
(Rao-Wu),
and mrbweights
(Preston).
For systematic samples, one-PSU-per-stratum designs, or other especially complex sample designs,
one can use the generalized survey bootstrap method. See as_gen_boot_design
or make_gen_boot_factors
.
Examples
library(survey)
# Example 1: A multistage sample with two stages of SRSWOR
## Load an example dataset from a multistage sample, with two stages of SRSWOR
data("mu284", package = 'survey')
multistage_srswor_design <- svydesign(data = mu284,
ids = ~ id1 + id2,
fpc = ~ n1 + n2)
## Convert the survey design object to a bootstrap design
set.seed(2022)
bootstrap_rep_design <- as_bootstrap_design(multistage_srswor_design,
replicates = 500)
## Compare std. error estimates from bootstrap versus linearization
data.frame(
'Statistic' = c('total', 'mean', 'median'),
'SE (bootstrap)' = c(SE(svytotal(x = ~ y1, design = bootstrap_rep_design)),
SE(svymean(x = ~ y1, design = bootstrap_rep_design)),
SE(svyquantile(x = ~ y1, quantile = 0.5,
design = bootstrap_rep_design))),
'SE (linearization)' = c(SE(svytotal(x = ~ y1, design = multistage_srswor_design)),
SE(svymean(x = ~ y1, design = multistage_srswor_design)),
SE(svyquantile(x = ~ y1, quantile = 0.5,
design = multistage_srswor_design))),
check.names = FALSE
)
# Example 2: A multistage-sample,
# first stage selected with unequal probabilities without replacement
# second stage selected with simple random sampling without replacement
data("library_multistage_sample", package = "svrep")
multistage_pps <- svydesign(data = library_multistage_sample,
ids = ~ PSU_ID + SSU_ID,
fpc = ~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB,
pps = "brewer")
bootstrap_rep_design <- as_bootstrap_design(
multistage_pps, replicates = 500,
samp_method_by_stage = c("PPSWOR", "SRSWOR")
)
## Compare std. error estimates from bootstrap versus linearization
data.frame(
'Statistic' = c('total', 'mean'),
'SE (bootstrap)' = c(
SE(svytotal(x = ~ TOTCIR, na.rm = TRUE,
design = bootstrap_rep_design)),
SE(svymean(x = ~ TOTCIR, na.rm = TRUE,
design = bootstrap_rep_design))),
'SE (linearization)' = c(
SE(svytotal(x = ~ TOTCIR, na.rm = TRUE,
design = multistage_pps)),
SE(svymean(x = ~ TOTCIR, na.rm = TRUE,
design = multistage_pps))),
check.names = FALSE
)