make_gen_boot_factors {svrep} | R Documentation |
Creates replicate factors for the generalized survey bootstrap
Description
Creates replicate factors for the generalized survey bootstrap method. The generalized survey bootstrap is a method for forming bootstrap replicate weights from a textbook variance estimator, provided that the variance estimator can be represented as a quadratic form whose matrix is positive semidefinite (this covers a large class of variance estimators).
Usage
make_gen_boot_factors(Sigma, num_replicates, tau = "auto", exact_vcov = FALSE)
Arguments
Sigma |
The matrix of the quadratic form used to represent the variance estimator. Must be positive semidefinite. |
num_replicates |
The number of bootstrap replicates to create. |
tau |
Either |
exact_vcov |
If |
Value
A matrix with the same number of rows as Sigma
, and the number of columns
equal to num_replicates
. The object has an attribute named tau
which can be retrieved
by calling attr(which = 'tau')
on the object. The value tau
is a rescaling factor
which was used to avoid negative weights.
In addition, the object has attributes named scale
and rscales
which can be
passed directly to svrepdesign. Note that the value of scale
is \tau^2/B
,
while the value of rscales
is vector of length B
, with every entry equal to 1
.
Statistical Details
Let v( \hat{T_y})
be the textbook variance estimator for an estimated population total \hat{T}_y
of some variable y
.
The base weight for case i
in our sample is w_i
, and we let \breve{y}_i
denote the weighted value w_iy_i
.
Suppose we can represent our textbook variance estimator as a quadratic form: v(\hat{T}_y) = \breve{y}\Sigma\breve{y}^T
,
for some n \times n
matrix \Sigma
.
The only constraint on \Sigma
is that, for our sample, it must be symmetric and positive semidefinite.
The bootstrapping process creates B
sets of replicate weights, where the b
-th set of replicate weights is a vector of length n
denoted \mathbf{a}^{(b)}
, whose k
-th value is denoted a_k^{(b)}
.
This yields B
replicate estimates of the population total, \hat{T}_y^{*(b)}=\sum_{k \in s} a_k^{(b)} \breve{y}_k
, for b=1, \ldots B
, which can be used to estimate sampling variance.
v_B\left(\hat{T}_y\right)=\frac{\sum_{b=1}^B\left(\hat{T}_y^{*(b)}-\hat{T}_y\right)^2}{B}
This bootstrap variance estimator can be written as a quadratic form:
v_B\left(\hat{T}_y\right) =\mathbf{\breve{y}}^{\prime}\Sigma_B \mathbf{\breve{y}}
where
\boldsymbol{\Sigma}_B = \frac{\sum_{b=1}^B\left(\mathbf{a}^{(b)}-\mathbf{1}_n\right)\left(\mathbf{a}^{(b)}-\mathbf{1}_n\right)^{\prime}}{B}
Note that if the vector of adjustment factors \mathbf{a}^{(b)}
has expectation \mathbf{1}_n
and variance-covariance matrix \boldsymbol{\Sigma}
,
then we have the bootstrap expectation E_{*}\left( \boldsymbol{\Sigma}_B \right) = \boldsymbol{\Sigma}
. Since the bootstrap process takes the sample values \breve{y}
as fixed, the bootstrap expectation of the variance estimator is E_{*} \left( \mathbf{\breve{y}}^{\prime}\Sigma_B \mathbf{\breve{y}}\right)= \mathbf{\breve{y}}^{\prime}\Sigma \mathbf{\breve{y}}
.
Thus, we can produce a bootstrap variance estimator with the same expectation as the textbook variance estimator simply by randomly generating \mathbf{a}^{(b)}
from a distribution with the following two conditions:
Condition 1: \quad \mathbf{E}_*(\mathbf{a})=\mathbf{1}_n
Condition 2: \quad \mathbf{E}_*\left(\mathbf{a}-\mathbf{1}_n\right)\left(\mathbf{a}-\mathbf{1}_n\right)^{\prime}=\mathbf{\Sigma}
While there are multiple ways to generate adjustment factors satisfying these conditions,
the simplest general method is to simulate from a multivariate normal distribution: \mathbf{a} \sim MVN(\mathbf{1}_n, \boldsymbol{\Sigma})
.
This is the method used by this function.
Details on Rescaling to Avoid Negative Adjustment Factors
Let \mathbf{A} = \left[ \mathbf{a}^{(1)} \cdots \mathbf{a}^{(b)} \cdots \mathbf{a}^{(B)} \right]
denote the (n \times B)
matrix of bootstrap adjustment factors.
To eliminate negative adjustment factors, Beaumont and Patak (2012) propose forming a rescaled matrix of nonnegative replicate factors \mathbf{A}^S
by rescaling each adjustment factor a_k^{(b)}
as follows:
a_k^{S,(b)} = \frac{a_k^{(b)} + \tau - 1}{\tau}
where \tau \geq 1 - a_k^{(b)} \geq 1
for all k
in \left\{ 1,\ldots,n \right\}
and all b
in \left\{1, \ldots, B\right\}
.
The value of \tau
can be set based on the realized adjustment factor matrix \mathbf{A}
or by choosing \tau
prior to generating the adjustment factor matrix \mathbf{A}
so that \tau
is likely to be large enough to prevent negative bootstrap weights.
If the adjustment factors are rescaled in this manner, it is important to adjust the scale factor used in estimating the variance with the bootstrap replicates, which becomes \frac{\tau^2}{B}
instead of \frac{1}{B}
.
\textbf{Prior to rescaling: } v_B\left(\hat{T}_y\right) = \frac{1}{B}\sum_{b=1}^B\left(\hat{T}_y^{*(b)}-\hat{T}_y\right)^2
\textbf{After rescaling: } v_B\left(\hat{T}_y\right) = \frac{\tau^2}{B}\sum_{b=1}^B\left(\hat{T}_y^{S*(b)}-\hat{T}_y\right)^2
When sharing a dataset that uses rescaled weights from a generalized survey bootstrap, the documentation for the dataset should instruct the user to use replication scale factor \frac{\tau^2}{B}
rather than \frac{1}{B}
when estimating sampling variances.
References
The generalized survey bootstrap was first proposed by Bertail and Combris (1997).
See Beaumont and Patak (2012) for a clear overview of the generalized survey bootstrap.
The generalized survey bootstrap represents one strategy for forming replication variance estimators
in the general framework proposed by Fay (1984) and Dippo, Fay, and Morganstein (1984).
- Beaumont, Jean-François, and Zdenek Patak. 2012. “On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling: Generalized Bootstrap for Sample Surveys.” International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x.
- Bertail, and Combris. 1997. “Bootstrap Généralisé d’un Sondage.” Annales d’Économie Et de Statistique, no. 46: 49. https://doi.org/10.2307/20076068.
- Dippo, Cathryn, Robert Fay, and David Morganstein. 1984. “Computing Variances from Complex Samples with Replicate Weights.” In, 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf.
- Fay, Robert. 1984. “Some Properties of Estimates of Variance Based on Replication Methods.” In, 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_095.pdf.
See Also
The function make_quad_form_matrix
can be used to
represent several common variance estimators as a quadratic form's matrix,
which can then be used as an input to make_gen_boot_factors()
.
Examples
## Not run:
library(survey)
# Load an example dataset that uses unequal probability sampling ----
data('election', package = 'survey')
# Create matrix to represent the Horvitz-Thompson estimator as a quadratic form ----
n <- nrow(election_pps)
pi <- election_jointprob
horvitz_thompson_matrix <- matrix(nrow = n, ncol = n)
for (i in seq_len(n)) {
for (j in seq_len(n)) {
horvitz_thompson_matrix[i,j] <- 1 - (pi[i,i] * pi[j,j])/pi[i,j]
}
}
## Equivalently:
horvitz_thompson_matrix <- make_quad_form_matrix(
variance_estimator = "Horvitz-Thompson",
joint_probs = election_jointprob
)
# Make generalized bootstrap adjustment factors ----
bootstrap_adjustment_factors <- make_gen_boot_factors(
Sigma = horvitz_thompson_matrix,
num_replicates = 80,
tau = 'auto'
)
# Determine replication scale factor for variance estimation ----
tau <- attr(bootstrap_adjustment_factors, 'tau')
B <- ncol(bootstrap_adjustment_factors)
replication_scaling_constant <- tau^2 / B
# Create a replicate design object ----
election_pps_bootstrap_design <- svrepdesign(
data = election_pps,
weights = 1 / diag(election_jointprob),
repweights = bootstrap_adjustment_factors,
combined.weights = FALSE,
type = "other",
scale = attr(bootstrap_adjustment_factors, 'scale'),
rscales = attr(bootstrap_adjustment_factors, 'rscales')
)
# Compare estimates to Horvitz-Thompson estimator ----
election_pps_ht_design <- svydesign(
id = ~1,
fpc = ~p,
data = election_pps,
pps = ppsmat(election_jointprob),
variance = "HT"
)
svytotal(x = ~ Bush + Kerry,
design = election_pps_bootstrap_design)
svytotal(x = ~ Bush + Kerry,
design = election_pps_ht_design)
## End(Not run)