Pkl.Hajek.s {samplingVarEst}R Documentation

The Hajek approximation for the 2nd order (joint) inclusion probabilities (sample based)


Computes the Hajek (1964) approximation for the 2nd order (joint) inclusion probabilities utilising only sample-based quantities.





vector of the first-order inclusion probabilities; its length is equal to the sample size. Values in VecPk.s must be greater than zero and less than or equal to one. There must not be missing values.


Let πk\pi_k denote the inclusion probability of the kk-th element in the sample ss, and let πkl\pi_{kl} denote the joint-inclusion probabilities of the kk-th and ll-th elements in the sample ss. If the joint-inclusion probabilities πkl\pi_{kl} are not available, the Hajek (1964) approximation can be used. Note that this approximation is designed for large-entropy sampling designs, large samples, and large populations, i.e. care should be taken with highly-stratified samples, e.g. Berger (2005).

The sample-based version of the Hajek (1964) approximation for the joint-inclusion probabilities πkl\pi_{kl} (implemented by the current function) is:

πklπkπl{1d^1(1πk)(1πl)}\pi_{kl} \doteq \pi_k \pi_l \{1 - \hat{d}^{-1}(1-\pi_k)(1-\pi_l)\}

where d^=ks(1πk)\hat{d} =\sum_{k\in s}(1-\pi_k).

The approximation was originally developed for dd\rightarrow\infty, under the maximum-entropy sampling design (see Hajek 1981, Theorem 3.3, Ch. 3 and 6), the Rejective Sampling design. It requires that the utilised sampling design is of large entropy. An overview can be found in Berger and Tille (2009). An account of different sampling designs, πkl\pi_{kl} approximations, and approximate variances under large-entropy designs can be found in Tille (2006), Brewer and Donadio (2003), and Haziza, Mecatti, and Rao (2008). Recently, Berger (2011) gave sufficient conditions under which Hajek's results still hold for large-entropy sampling designs that are not the maximum-entropy one.


The function returns a (nn by nn) square matrix with the estimated joint inclusion probabilities, where nn is the sample size.


Emilio Lopez Escobar.


Berger, Y. G. (2005) Variance estimation with highly stratified sampling designs with unequal probabilities. Australian & New Zealand Journal of Statistics, 47, 365–373.

Berger, Y. G. (2011) Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pakistan Journal of Statististics, 27, 407–426.

Berger, Y. G. and Tille, Y. (2009) Sampling with unequal probabilities. In Sample Surveys: Design, Methods and Applications (eds. D. Pfeffermann and C. R. Rao), 39–54. Elsevier, Amsterdam.

Brewer, K. R. W. and Donadio, M. E. (2003) The large entropy variance of the Horvitz-Thompson estimator. Survey Methodology 29, 189–196.

Hajek, J. (1964) Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 4, 1491–1523.

Hajek, J. (1981) Sampling From a Finite Population. Dekker, New York.

Haziza, D., Mecatti, F. and Rao, J. N. K. (2008) Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron, LXVI, 91–108.

Tille, Y. (2006) Sampling Algorithms. Springer, New York.

See Also



data(oaxaca)                                 #Loads the Oaxaca municipalities dataset
pik.U  <- Pk.PropNorm.U(373, oaxaca$HOMES00) #Reconstructs the 1st order incl. probs.
s      <- oaxaca$sHOMES00                    #Defines the sample to be used
#This approximation is only suitable for large-entropy sampling designs
pikl.s <- Pkl.Hajek.s(pik.U[s==1])           #Approx. 2nd order incl. probs. from s
#First 5 rows/cols of (sample-based) 2nd order incl. probs. matrix

[Package samplingVarEst version 1.5 Index]