variance-estimators {svrep} | R Documentation |
Variance Estimators
Description
This help page describes variance estimators
which are commonly used for survey samples. These variance estimators
can be used as the basis of the generalized replication methods, implemented
with the functions as_fays_gen_rep_design()
,
as_gen_boot_design()
,
make_fays_gen_rep_factors()
,
or make_gen_boot_factors()
Shared Notation
Let denote the selected sample of size
, with elements
.
Element
in the sample had probability
of being included in the sample.
The pair of elements
was sampled with probability
.
The population total for a variable is denoted ,
and the Horvitz-Thompson estimator for
is
denoted
. For convenience,
we denote
.
The true sampling variance of is denoted
,
while an estimator of this sampling variance is denoted
.
Horvitz-Thompson
The Horvitz-Thompson variance estimator:
Yates-Grundy
The Yates-Grundy variance estimator:
Poisson Horvitz-Thompson
The Poisson Horvitz-Thompson variance estimator
is simply the Horvitz-Thompson variance estimator, but
where , which is the case for Poisson sampling.
Stratified Multistage SRS
The Stratified Multistage SRS variance estimator is the recursive variance estimator proposed by Bellhouse (1985) and used in the 'survey' package's function svyrecvar. In the case of simple random sampling without replacement (with one or more stages), this estimator exactly matches the Horvitz-Thompson estimator.
The estimator can be used for any number of sampling stages. For illustration, we describe its use for two sampling stages.
where
and
where is the number of sampled clusters in stratum
,
is the number of population clusters in stratum
,
is the weighted cluster total in cluster
of stratum
,
is the mean weighted cluster total of stratum
,
(
), and
is the estimated sampling variance of
.
Ultimate Cluster
The Ultimate Cluster variance estimator is simply the stratified multistage SRS variance estimator, but ignoring variances from later stages of sampling.
This is the variance estimator used in the 'survey' package when the user specifies
option(survey.ultimate.cluster = TRUE)
or uses svyrecvar(..., one.stage = TRUE)
.
When the first-stage sampling fractions are small, analysts often omit the finite population corrections
when using the ultimate cluster estimator.
SD1 and SD2 (Successive Difference Estimators)
The SD1 and SD2 variance estimators are "successive difference" estimators sometimes used for systematic sampling designs. Ash (2014) describes each estimator as follows:
where is the weighted value of unit
with selection probability
. The SD1 estimator is recommended by Wolter (1984).
The SD2 estimator is the basis of the successive difference replication estimator commonly
used for systematic sampling designs and is more conservative. See Ash (2014) for details.
For multistage samples, SD1 and SD2 are applied to the clusters at each stage, separately by stratum.
For later stages of sampling, the variance estimate from a stratum is multiplied by the product
of sampling fractions from earlier stages of sampling. For example, at a third stage of sampling,
the variance estimate from a third-stage stratum is multiplied by ,
which is the product of sampling fractions from the first-stage stratum and second-stage stratum.
Deville 1 and Deville 2
The "Deville-1" and "Deville-2" variance estimators are clearly described in Matei and Tillé (2005), and are intended for designs that use fixed-size, unequal-probability random sampling without replacement. These variance estimators have been shown to be effective for designs that use a fixed sample size with a high-entropy sampling method. This includes most PPSWOR sampling methods, but unequal-probability systematic sampling is an important exception.
These variance estimators take the following form:
where is the weighted value of the the variable of interest,
and
depend on the method used:
-
"Deville-1":
-
"Deville-2":
In the case of simple random sampling without replacement (SRSWOR), these estimators are both identical to the usual stratified multistage SRS estimator (which is itself a special case of the Horvitz-Thompson estimator).
For multistage samples, "Deville-1" and "Deville-2" are applied to the clusters at each stage, separately by stratum.
For later stages of sampling, the variance estimate from a stratum is multiplied by the product
of sampling probabilities from earlier stages of sampling. For example, at a third stage of sampling,
the variance estimate from a third-stage stratum is multiplied by ,
where
is the sampling probability of the first-stage unit
and
is the sampling probability of the second-stage unit
within the first-stage unit.
Deville-Tillé
See Section 6.8 of Tillé (2020) for more detail on this estimator, including an explanation of its quadratic form. See Deville and Tillé (2005) for the results of a simulation study comparing this and other alternative estimators for balanced sampling.
The estimator can be written as follows:
where
and denotes the vector of auxiliary variables for observation
included in sample
, with inclusion probability
. The value
is set to
,
where
is the number of observations and
is the number of auxiliary variables.
References
Ash, S. (2014). "Using successive difference replication for estimating variances." Survey Methodology, Statistics Canada, 40(1), 47–59.
Bellhouse, D.R. (1985). "Computing Methods for Variance Estimation in Complex Surveys." Journal of Official Statistics, Vol.1, No.3.
Deville, J.‐C., and Tillé, Y. (2005). "Variance approximation under balanced sampling." Journal of Statistical Planning and Inference, 128, 569–591.
Tillé, Y. (2020). "Sampling and estimation from finite populations." (I. Hekimi, Trans.). Wiley.
Matei, Alina, and Yves Tillé. (2005). “Evaluation of Variance Approximations and Estimators in Maximum Entropy Sampling with Unequal Probability and Fixed Sample Size.” Journal of Official Statistics, 21(4):543–70.