R: Approximate the Variance of the Horvitz-Thompson estimator

Var_approx {UPSvarApprox}

R Documentation

Approximate the Variance of the Horvitz-Thompson estimator

Description

Approximations of the Horvitz-Thompson variance for High-Entropy sampling designs. Such methods use only first-order inclusion probabilities.

Usage

Var_approx(y, pik, n, method, ...)

Arguments

`y`	numeric vector containing the values of the variable of interest for all population units
`pik`	numeric vector of first-order inclusion probabilities, of length equal to population size
`n`	a scalar indicating the sample size
`method`	string indicating the approximation that should be used. One of "Hajek1", "Hajek2", "HartleyRao1", "HartleyRao2", "FixedPoint".
`...`	two optional parameters can be modified to control the iterative procedure in `method="FixedPoint"`: `maxIter` sets the maximum number of iterations and `eps` controls the convergence error

Details

The variance approximations available in this function are described below, the notation used is that of Matei and Tillé (2005).

Hájek variance approximation (method="Hajek1"):

\tilde{Var} = \sum_{i \in U} \frac{b_i}{\pi_i^2}(y_i - y_i^*)^2

where

y_i^* = \pi_i \frac{ \sum_{j\in U} b_j y_j/\pi_j }{ \sum_{j \in U} b_j }

and

b_i = \frac{ \pi_i(1-\pi_i)N }{ N-1 }
Starting from Hajék (1964), Brewer (2002) defined the following estimator (method="Hajek2"):

\tilde{Var} = \sum_{i \in U} \pi_i(1-\pi_i) \Bigl( \frac{y_i}{\pi_i} - \frac{\tilde{Y}}{n} \Bigr)^2

where \tilde{Y} = \sum_{i \in U} a_i y_i and a_i = n(1-\pi_i)/\sum_{j \in U} \pi_j(1-\pi_j)
Hartley and Rao (1962) variance approximation (method="HartleyRao1"):

\tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2

\qquad - \frac{n-1}{n^2} \sum_{i \in U} \Biggl( 2\pi_i^3 - \frac{\pi_i^2}{2} \sum_{j \in U} \pi_j^2 \Biggr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2

\quad \qquad + \frac{2(n-1)}{n^3} \Biggl( \sum_{i \in U}\pi_i y_i - \frac{Y}{n}\sum_{i\in U} \pi_i^2 \Biggr)^2
Hartley and Rao (1962) provide a simplified version of the variance above (method="HartleyRao2"):

\tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2
method="FixedPoint" computes the Fixed-Point variance approximation proposed by Deville and Tillé (2005). The variance can be expressed in the same form as in method="Hajek1", and the coefficients b_i are computed iteratively by the algorithm:
1. b_i^{(0)} = \pi_i (1-\pi_i) \frac{N}{N-1}, \,\, \forall i \in U
2. b_i^{(k)} = \frac{(b_i^{(k-1)})^2 }{\sum_{j\in U} b_j^{(k-1)} } + \pi_i(1-\pi_i)
a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:

b_i = \pi_i(1-\pi_i)\Biggl( \frac{N\pi_i(1-\pi_i)}{ (N-1)\sum_{j\in U}\pi_j(1-\pi_j) } + 1 \Biggr)

Value

a scalar, the approximated variance.

References

Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.

Examples


N <- 500; n <- 50

set.seed(0)
x <- rgamma(n=N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )

pik  <- n * x/sum(x)
pikl <- outer(pik, pik, '*'); diag(pikl) <- pik

### Variance approximations ---
Var_approx(y, pik, n, method = "Hajek1")
Var_approx(y, pik, n, method = "Hajek2")
Var_approx(y, pik, n, method = "HartleyRao1")
Var_approx(y, pik, n, method = "HartleyRao2")
Var_approx(y, pik, n, method = "FixedPoint")

[Package UPSvarApprox version 0.1.4 Index]