R: Approximate Joint-Inclusion Probabilities

jip_approx {jipApprox}

R Documentation

Approximate Joint-Inclusion Probabilities

Description

Approximations of joint-inclusion probabilities by means of first-order inclusion probabilities.

Usage

jip_approx(pik, method)

Arguments

`pik`	numeric vector of first-order inclusion probabilities for all population units.
`method`	string representing one of the available approximation methods.

Details

Available methods are "Hajek", "HartleyRao", "Tille", "Brewer1","Brewer2","Brewer3", and "Brewer4". Note that these methods were derived for high-entropy sampling designs, therefore they could have low performance under different designs.

Hájek (1964) approximation [method="Hajek"] is derived under Maximum Entropy sampling design and is given by

\tilde{\pi}_{ij} = \pi_i\pi_j \frac{1 - (1-\pi_i)(1-\pi_j)}{d}

where d = \sum_{i\in U} \pi_i(1-\pi_i)

Hartley and Rao (1962) proposed the following approximation under randomised systematic sampling [method="HartleyRao"]:

\tilde{\pi}_{ij} = \frac{n-1}{n} \pi_i\pi_j + \frac{n-1}{n^2} (\pi_i^2 \pi_j + \pi_i \pi_j^2) - \frac{n-1}{n^3}\pi_i\pi_j \sum_{i\in U} \pi_j^2

+ \frac{2(n-1)}{n^3} (\pi_i^3 \pi_j + \pi_i\pi_j^3 + \pi_i^2 \pi_j^2) - \frac{3(n-1)}{n^4} (\pi_i^2 \pi_j + \pi_i\pi_j^2) \sum_{i \in U}\pi_i^2

+ \frac{3(n-1)}{n^5} \pi_i\pi_j \biggl( \sum_{i\in U} \pi_i^2 \biggr)^2 - \frac{2(n-1)}{n^4} \pi_i\pi_j \sum_{i \in U} \pi_j^3

Tillé (1996) proposed the approximation \tilde{\pi}_{ij} = \beta_i\beta_j, where the coefficients \beta_i are computed iteratively through the following procedure [method="Tille"]:

\beta_i^{(0)} = \pi_i, \,\, \forall i\in U
\beta_i^{(2k-1)} = \frac{(n-1)\pi_i}{\beta^{(2k-2)} - \beta_i^{(2k-2)}}
\beta_i^{2k} = \beta_i^{(2k-1)} \Biggl( \frac{n(n-1)}{(\beta^(2k-1))^2 - \sum_{i\in U} (\beta_k^{(2k-1)})^2 } \Biggr)^(1/2)

with \beta^{(k)} = \sum_{i\in U} \beta_i^{i}, \,\, k=1,2,3, \dots

Finally, Brewer (2002) and Brewer and Donadio (2003) proposed four approximations, which are defined by the general form

\tilde{\pi}_{ij} = \pi_i\pi_j (c_i + c_j)/2

where the c_i determine the approximation used:

Equation (9) [method="Brewer1"]:

c_i = (n-1) / (n-\pi_i)
Equation (10) [method="Brewer2"]:

c_i = (n-1) / \Bigl(n- n^{-1}\sum_{i\in U}\pi_i^2 \Bigr)
Equation (11) [method="Brewer3"]:

c_i = (n-1) / \Bigl(n - 2\pi_i + n^{-1}\sum_{i\in U}\pi_i^2 \Bigr)
Equation (18) [method="Brewer4"]:

c_i = (n-1) / \Bigl(n - (2n-1)(n-1)^{-1}\pi_i + (n-1)^{-1}\sum_{i\in U}\pi_i^2 \Bigr)

Value

A symmetric matrix of inclusion probabilities, which diagonal is the vector of first-order inclusion probabilities.

References

Hartley, H.O.; Rao, J.N.K., 1962. Sampling With Unequal Probability and Without Replacement. The Annals of Mathematical Statistics 33 (2), 350-374.

Hájek, J., 1964. Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population. The Annals of Mathematical Statistics 35 (4), 1491-1523.

Tillé, Y., 1996. Some Remarks on Unequal Probability Sampling Designs Without Replacement. Annals of Economics and Statistics 44, 177-189.

Brewer, K.R.W.; Donadio, M.E., 2003. The High Entropy Variance of the Horvitz-Thompson Estimator. Survey Methodology 29 (2), 189-196.

Examples


### Generate population data ---
N <- 20; n<-5

set.seed(0)
x <- rgamma(N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )

pik  <- n * x/sum(x)

### Approximate joint-inclusion probabilities ---
pikl <- jip_approx(pik, method='Hajek')
pikl <- jip_approx(pik, method='HartleyRao')
pikl <- jip_approx(pik, method='Tille')
pikl <- jip_approx(pik, method='Brewer1')
pikl <- jip_approx(pik, method='Brewer2')
pikl <- jip_approx(pik, method='Brewer3')
pikl <- jip_approx(pik, method='Brewer4')

[Package jipApprox version 0.1.5 Index]