R: Horvitz-Thompson Estimator

htestimate {samplingbook}

R Documentation

Horvitz-Thompson Estimator

Description

Calculates Horvitz-Thompson estimate with different methods for variance estimation such as Yates and Grundy, Hansen-Hurwitz and Hajek.

Usage

htestimate(y, N, PI, pk, pik, method = 'yg')

Arguments

`y`	vector of observations
`N`	integer for population size
`PI`	square matrix of second order inclusion probabilities with `n` rows and cols. It is necessary to be specified for variance estimation by methods `'ht'` and `'yg'`.
`pk`	vector of first order inclusion probabilities of length `n` for the sample elements. It is necessary to be specified for variance estimation by methods `'hh'` and `'ha'`.
`pik`	an optional vector of first order inclusion probabilities of length `N` for the population elements . It can be used for variance estimation by method `'ha'`.
`method`	method to be used for variance estimation. Options are `'yg'` (Yates and Grundy) and `'ht'` (Horvitz-Thompson), approximate options are `'hh'` (Hansen-Hurwitz) and `'ha'` (Hajek).

Details

For using methods 'yg' or 'ht' has to be provided matrix PI, and for 'hh' and 'ha' has to be specified vector pk of inclusion probabilities. Additionally, for Hajek method 'ha' can be specified pik. Unless, an approximate Hajek method is used.

Value

The function htestimate returns a value, which is a list consisting of the components

`call`	is a list of call components: `y` observations, `N` population size, `PI` inclusion probabilities, `pk` inclusion probabilities of sample, `pik` full inclusion probabilities and `method` method for variance estimation
`mean`	mean estimate
`se`	standard error of the mean estimate

Author(s)

Juliane Manitz

References

Kauermann, Goeran/Kuechenhoff, Helmut (2010): Stichproben. Methoden und praktische Umsetzung mit R. Springer.

Examples

data(influenza)
summary(influenza)

# pps.sampling()
set.seed(108506)
pps <- pps.sampling(z=influenza$population,n=20,method='midzuno')
sample <- influenza[pps$sample,]
# htestimate()
N <- nrow(influenza)
# exact variance estimate
PI <- pps$PI
htestimate(sample$cases, N=N, PI=PI, method='yg')
htestimate(sample$cases, N=N, PI=PI, method='ht')
# approximate variance estimate
pk <- pps$pik[pps$sample]
htestimate(sample$cases, N=N, pk=pk, method='hh')
pik <- pps$pik
htestimate(sample$cases, N=N, pk=pk, pik=pik, method='ha')
# without pik just approximate calculation of Hajek method
htestimate(sample$cases, N=N, pk=pk, method='ha') 
# calculate confidence interval based on normal distribution for number of cases
est.ht <- htestimate(sample$cases, N=N, PI=PI, method='ht')
est.ht$mean*N  
lower <- est.ht$mean*N - qnorm(0.975)*N*est.ht$se
upper <- est.ht$mean*N + qnorm(0.975)*N*est.ht$se
c(lower,upper) 
# true number of influenza cases
sum(influenza$cases)

[Package samplingbook version 1.2.4 Index]