make_rwyb_bootstrap_weights {svrep} | R Documentation |
Create bootstrap replicate weights for a general survey design, using the Rao-Wu-Yue-Beaumont bootstrap method
Description
Creates bootstrap replicate weights for a multistage stratified sample design
using the method of Beaumont and Émond (2022), which is a generalization of the Rao-Wu-Yue bootstrap.
The design may have different sampling methods used at different stages.
Each stage of sampling may potentially use unequal probabilities (with or without replacement)
and may potentially use Poisson sampling.
Usage
make_rwyb_bootstrap_weights(
num_replicates = 100,
samp_unit_ids,
strata_ids,
samp_unit_sel_probs,
samp_method_by_stage = rep("PPSWOR", times = ncol(samp_unit_ids)),
allow_final_stage_singletons = TRUE,
output = "weights"
)
Arguments
num_replicates |
Positive integer giving the number of bootstrap replicates to create |
samp_unit_ids |
Matrix or data frame of sampling unit IDs for each stage of sampling |
strata_ids |
Matrix or data frame of strata IDs for each sampling unit at each stage of sampling |
samp_unit_sel_probs |
Matrix or data frame of selection probabilities for each sampling unit at each stage of sampling. |
samp_method_by_stage |
A vector with length equal to the number of stages of sampling,
corresponding to the number of columns in
|
allow_final_stage_singletons |
Logical value indicating whether to allow
non-certainty singleton strata at the final sampling stage (rather than throw an error message). |
output |
Either |
Details
Beaumont and Émond (2022) describe a general algorithm for forming bootstrap replicate weights for multistage stratified samples, based on the method of Rao-Wu-Yue, with extensions to sampling without replacement and use of unequal probabilities of selection (i.e., sampling with probability proportional to size) as well as Poisson sampling. These methods are guaranteed to produce nonnegative replicate weights and provide design-unbiased and design-consistent variance estimates for totals, for designs where sampling uses one or more of the following methods:
"SRSWOR" - Simple random sampling, without replacement
"SRSWR" - Simple random sampling, with replacement
"PPSWR" - Unequal probabilities of selection, with replacement
"Poisson" - Poisson sampling: each sampling unit is selected into the sample at most once, with potentially different probabilities of inclusion for each sampling unit.
For designs where at least one stage's strata have sampling without replacement with unequal probabilities of selection ("PPSWOR"),
the bootstrap method of Beaumont and Émond (2022) is guaranteed to produce nonnegative weights, but is not design-unbiased,
since the method only approximates the joint selection probabilities which would be needed for unbiased estimation.
Unless any stages use simple random sampling without replacement, the resulting bootstrap replicate weights
are guaranteed to all be strictly positive, which may be useful for calibration or analyses of domains with small sample sizes.
If any stages use simple random sampling without replacement, it is possible for some replicate weights to be zero.
If there is survey nonresponse, it may be useful to represent the response/nonresponse as an additional
stage of sampling, where sampling is conducted with Poisson sampling
where each unit's "selection probability" at that stage is its response propensity (which typically has to be estimated).
Value
A matrix of with the same number of rows as samp_unit_ids
and the number of columns equal to the value of the argument num_replicates
.
Specifying output = "factors"
returns a matrix of replicate adjustment factors which can later be multiplied by
the full-sample weights to produce a matrix of replicate weights.
Specifying output = "weights"
returns the matrix of replicate weights,
where the full-sample weights are inferred using samp_unit_sel_probs
.
References
Beaumont, J.-F.; Émond, N. (2022). "A Bootstrap Variance Estimation Method for Multistage Sampling and Two-Phase Sampling When Poisson Sampling Is Used at the Second Phase." Stats, 5: 339–357. https://doi.org/10.3390/stats5020019
Rao, J.N.K.; Wu, C.F.J.; Yue, K. (1992). "Some recent work on resampling methods for complex surveys." Surv. Methodol., 18: 209–217.
See Also
If the survey design can be accurately represented using svydesign
,
then it is easier to simply use as_bootstrap_design
with argument type = "Rao-Wu-Yue-Beaumont"
.
Use estimate_boot_reps_for_target_cv
to help choose the number of bootstrap replicates.
Examples
## Not run:
library(survey)
# Example 1: A multistage sample with two stages of SRSWOR
## Load an example dataset from a multistage sample, with two stages of SRSWOR
data("mu284", package = 'survey')
multistage_srswor_design <- svydesign(data = mu284,
ids = ~ id1 + id2,
fpc = ~ n1 + n2)
## Create bootstrap replicate weights
set.seed(2022)
bootstrap_replicate_weights <- make_rwyb_bootstrap_weights(
num_replicates = 5000,
samp_unit_ids = multistage_srswor_design$cluster,
strata_ids = multistage_srswor_design$strata,
samp_unit_sel_probs = multistage_srswor_design$fpc$sampsize /
multistage_srswor_design$fpc$popsize,
samp_method_by_stage = c("SRSWOR", "SRSWOR")
)
## Create a replicate design object with the survey package
bootstrap_rep_design <- svrepdesign(
data = multistage_srswor_design$variables,
repweights = bootstrap_replicate_weights,
weights = weights(multistage_srswor_design, type = "sampling"),
type = "bootstrap"
)
## Compare std. error estimates from bootstrap versus linearization
data.frame(
'Statistic' = c('total', 'mean', 'median'),
'SE (bootstrap)' = c(SE(svytotal(x = ~ y1, design = bootstrap_rep_design)),
SE(svymean(x = ~ y1, design = bootstrap_rep_design)),
SE(svyquantile(x = ~ y1, quantile = 0.5,
design = bootstrap_rep_design))),
'SE (linearization)' = c(SE(svytotal(x = ~ y1, design = multistage_srswor_design)),
SE(svymean(x = ~ y1, design = multistage_srswor_design)),
SE(svyquantile(x = ~ y1, quantile = 0.5,
design = multistage_srswor_design))),
check.names = FALSE
)
# Example 2: A single-stage sample selected with unequal probabilities, without replacement
## Load an example dataset of U.S. counties states with 2004 Presidential vote counts
data("election", package = 'survey')
pps_wor_design <- svydesign(data = election_pps,
pps = "overton",
fpc = ~ p, # Inclusion probabilities
ids = ~ 1)
## Create bootstrap replicate weights
set.seed(2022)
bootstrap_replicate_weights <- make_rwyb_bootstrap_weights(
num_replicates = 5000,
samp_unit_ids = pps_wor_design$cluster,
strata_ids = pps_wor_design$strata,
samp_unit_sel_probs = pps_wor_design$prob,
samp_method_by_stage = c("PPSWOR")
)
## Create a replicate design object with the survey package
bootstrap_rep_design <- svrepdesign(
data = pps_wor_design$variables,
repweights = bootstrap_replicate_weights,
weights = weights(pps_wor_design, type = "sampling"),
type = "bootstrap"
)
## Compare std. error estimates from bootstrap versus linearization
data.frame(
'Statistic' = c('total', 'mean'),
'SE (bootstrap)' = c(SE(svytotal(x = ~ Bush, design = bootstrap_rep_design)),
SE(svymean(x = ~ I(Bush/votes),
design = bootstrap_rep_design))),
'SE (Overton\'s PPS approximation)' = c(SE(svytotal(x = ~ Bush,
design = pps_wor_design)),
SE(svymean(x = ~ I(Bush/votes),
design = pps_wor_design))),
check.names = FALSE
)
## End(Not run)