Var_approx {UPSvarApprox} | R Documentation |
Approximate the Variance of the Horvitz-Thompson estimator
Description
Approximations of the Horvitz-Thompson variance for High-Entropy sampling designs. Such methods use only first-order inclusion probabilities.
Usage
Var_approx(y, pik, n, method, ...)
Arguments
y |
numeric vector containing the values of the variable of interest for all population units |
pik |
numeric vector of first-order inclusion probabilities, of length equal to population size |
n |
a scalar indicating the sample size |
method |
string indicating the approximation that should be used. One of "Hajek1", "Hajek2", "HartleyRao1", "HartleyRao2", "FixedPoint". |
... |
two optional parameters can be modified to control the iterative
procedure in |
Details
The variance approximations available in this function are described below, the notation used is that of Matei and Tillé (2005).
Hájek variance approximation (
method="Hajek1"
):\tilde{Var} = \sum_{i \in U} \frac{b_i}{\pi_i^2}(y_i - y_i^*)^2
where
y_i^* = \pi_i \frac{ \sum_{j\in U} b_j y_j/\pi_j }{ \sum_{j \in U} b_j }
and
b_i = \frac{ \pi_i(1-\pi_i)N }{ N-1 }
Starting from Hajék (1964), Brewer (2002) defined the following estimator (
method="Hajek2"
):\tilde{Var} = \sum_{i \in U} \pi_i(1-\pi_i) \Bigl( \frac{y_i}{\pi_i} - \frac{\tilde{Y}}{n} \Bigr)^2
where
\tilde{Y} = \sum_{i \in U} a_i y_i
anda_i = n(1-\pi_i)/\sum_{j \in U} \pi_j(1-\pi_j)
Hartley and Rao (1962) variance approximation (
method="HartleyRao1"
):\tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2
\qquad - \frac{n-1}{n^2} \sum_{i \in U} \Biggl( 2\pi_i^3 - \frac{\pi_i^2}{2} \sum_{j \in U} \pi_j^2 \Biggr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2
\quad \qquad + \frac{2(n-1)}{n^3} \Biggl( \sum_{i \in U}\pi_i y_i - \frac{Y}{n}\sum_{i\in U} \pi_i^2 \Biggr)^2
Hartley and Rao (1962) provide a simplified version of the variance above (
method="HartleyRao2"
):\tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2
-
method="FixedPoint"
computes the Fixed-Point variance approximation proposed by Deville and Tillé (2005). The variance can be expressed in the same form as inmethod="Hajek1"
, and the coefficientsb_i
are computed iteratively by the algorithm:-
b_i^{(0)} = \pi_i (1-\pi_i) \frac{N}{N-1}, \,\, \forall i \in U
-
b_i^{(k)} = \frac{(b_i^{(k-1)})^2 }{\sum_{j\in U} b_j^{(k-1)} } + \pi_i(1-\pi_i)
a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:
b_i = \pi_i(1-\pi_i)\Biggl( \frac{N\pi_i(1-\pi_i)}{ (N-1)\sum_{j\in U}\pi_j(1-\pi_j) } + 1 \Biggr)
-
Value
a scalar, the approximated variance.
References
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
Examples
N <- 500; n <- 50
set.seed(0)
x <- rgamma(n=N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )
pik <- n * x/sum(x)
pikl <- outer(pik, pik, '*'); diag(pikl) <- pik
### Variance approximations ---
Var_approx(y, pik, n, method = "Hajek1")
Var_approx(y, pik, n, method = "Hajek2")
Var_approx(y, pik, n, method = "HartleyRao1")
Var_approx(y, pik, n, method = "HartleyRao2")
Var_approx(y, pik, n, method = "FixedPoint")