make_gen_boot_factors {svrep} | R Documentation |
Creates replicate factors for the generalized survey bootstrap
Description
Creates replicate factors for the generalized survey bootstrap method. The generalized survey bootstrap is a method for forming bootstrap replicate weights from a textbook variance estimator, provided that the variance estimator can be represented as a quadratic form whose matrix is positive semidefinite (this covers a large class of variance estimators).
Usage
make_gen_boot_factors(Sigma, num_replicates, tau = "auto", exact_vcov = FALSE)
Arguments
Sigma |
The matrix of the quadratic form used to represent the variance estimator. Must be positive semidefinite. |
num_replicates |
The number of bootstrap replicates to create. |
tau |
Either |
exact_vcov |
If |
Value
A matrix with the same number of rows as Sigma
, and the number of columns
equal to num_replicates
. The object has an attribute named tau
which can be retrieved
by calling attr(which = 'tau')
on the object. The value tau
is a rescaling factor
which was used to avoid negative weights.
In addition, the object has attributes named scale
and rscales
which can be
passed directly to svrepdesign. Note that the value of scale
is ,
while the value of
rscales
is vector of length , with every entry equal to
.
Statistical Details
Let be the textbook variance estimator for an estimated population total
of some variable
.
The base weight for case
in our sample is
, and we let
denote the weighted value
.
Suppose we can represent our textbook variance estimator as a quadratic form:
,
for some
matrix
.
The only constraint on
is that, for our sample, it must be symmetric and positive semidefinite.
The bootstrapping process creates sets of replicate weights, where the
-th set of replicate weights is a vector of length
denoted
, whose
-th value is denoted
.
This yields
replicate estimates of the population total,
, for
, which can be used to estimate sampling variance.
This bootstrap variance estimator can be written as a quadratic form:
where
Note that if the vector of adjustment factors has expectation
and variance-covariance matrix
,
then we have the bootstrap expectation
. Since the bootstrap process takes the sample values
as fixed, the bootstrap expectation of the variance estimator is
.
Thus, we can produce a bootstrap variance estimator with the same expectation as the textbook variance estimator simply by randomly generating
from a distribution with the following two conditions:
Condition 1:
Condition 2:
While there are multiple ways to generate adjustment factors satisfying these conditions,
the simplest general method is to simulate from a multivariate normal distribution: .
This is the method used by this function.
Details on Rescaling to Avoid Negative Adjustment Factors
Let denote the
matrix of bootstrap adjustment factors.
To eliminate negative adjustment factors, Beaumont and Patak (2012) propose forming a rescaled matrix of nonnegative replicate factors
by rescaling each adjustment factor
as follows:
where for all
in
and all
in
.
The value of can be set based on the realized adjustment factor matrix
or by choosing
prior to generating the adjustment factor matrix
so that
is likely to be large enough to prevent negative bootstrap weights.
If the adjustment factors are rescaled in this manner, it is important to adjust the scale factor used in estimating the variance with the bootstrap replicates, which becomes instead of
.
When sharing a dataset that uses rescaled weights from a generalized survey bootstrap, the documentation for the dataset should instruct the user to use replication scale factor rather than
when estimating sampling variances.
References
The generalized survey bootstrap was first proposed by Bertail and Combris (1997).
See Beaumont and Patak (2012) for a clear overview of the generalized survey bootstrap.
The generalized survey bootstrap represents one strategy for forming replication variance estimators
in the general framework proposed by Fay (1984) and Dippo, Fay, and Morganstein (1984).
- Beaumont, Jean-François, and Zdenek Patak. 2012. “On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling: Generalized Bootstrap for Sample Surveys.” International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x.
- Bertail, and Combris. 1997. “Bootstrap Généralisé d’un Sondage.” Annales d’Économie Et de Statistique, no. 46: 49. https://doi.org/10.2307/20076068.
- Dippo, Cathryn, Robert Fay, and David Morganstein. 1984. “Computing Variances from Complex Samples with Replicate Weights.” In, 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf.
- Fay, Robert. 1984. “Some Properties of Estimates of Variance Based on Replication Methods.” In, 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_095.pdf.
See Also
The function make_quad_form_matrix
can be used to
represent several common variance estimators as a quadratic form's matrix,
which can then be used as an input to make_gen_boot_factors()
.
Examples
## Not run:
library(survey)
# Load an example dataset that uses unequal probability sampling ----
data('election', package = 'survey')
# Create matrix to represent the Horvitz-Thompson estimator as a quadratic form ----
n <- nrow(election_pps)
pi <- election_jointprob
horvitz_thompson_matrix <- matrix(nrow = n, ncol = n)
for (i in seq_len(n)) {
for (j in seq_len(n)) {
horvitz_thompson_matrix[i,j] <- 1 - (pi[i,i] * pi[j,j])/pi[i,j]
}
}
## Equivalently:
horvitz_thompson_matrix <- make_quad_form_matrix(
variance_estimator = "Horvitz-Thompson",
joint_probs = election_jointprob
)
# Make generalized bootstrap adjustment factors ----
bootstrap_adjustment_factors <- make_gen_boot_factors(
Sigma = horvitz_thompson_matrix,
num_replicates = 80,
tau = 'auto'
)
# Determine replication scale factor for variance estimation ----
tau <- attr(bootstrap_adjustment_factors, 'tau')
B <- ncol(bootstrap_adjustment_factors)
replication_scaling_constant <- tau^2 / B
# Create a replicate design object ----
election_pps_bootstrap_design <- svrepdesign(
data = election_pps,
weights = 1 / diag(election_jointprob),
repweights = bootstrap_adjustment_factors,
combined.weights = FALSE,
type = "other",
scale = attr(bootstrap_adjustment_factors, 'scale'),
rscales = attr(bootstrap_adjustment_factors, 'rscales')
)
# Compare estimates to Horvitz-Thompson estimator ----
election_pps_ht_design <- svydesign(
id = ~1,
fpc = ~p,
data = election_pps,
pps = ppsmat(election_jointprob),
variance = "HT"
)
svytotal(x = ~ Bush + Kerry,
design = election_pps_bootstrap_design)
svytotal(x = ~ Bush + Kerry,
design = election_pps_ht_design)
## End(Not run)