R: Variance estimation for sample surveys by the ultimate...

variance_est {vardpoor}

R Documentation

Variance estimation for sample surveys by the ultimate cluster method

Description

Computes the variance estimation by the ultimate cluster method.

Usage

variance_est(
  Y,
  H,
  PSU,
  w_final,
  N_h = NULL,
  fh_zero = FALSE,
  PSU_level = TRUE,
  PSU_sort = NULL,
  period = NULL,
  dataset = NULL,
  msg = "",
  checking = TRUE
)

Arguments

`Y`	Variables of interest. Object convertible to `data.table` or variable names as character, column numbers.
`H`	The unit stratum variable. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`PSU`	Primary sampling unit variable. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`w_final`	Weight variable. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`N_h`	Number of primary sampling units in population for each stratum (and period if `period` is not `NULL`). If `N_h = NULL` and `fh_zero = FALSE` (default), `N_h` is estimated from sample data as sum of weights (`w_final`) in each stratum (and period if `period` is not `NULL`). Optional for single-stage sampling design as it will be estimated from sample data. Recommended for multi-stage sampling design as `N_h` can not be correctly estimated from the sample data in this case. If `N_h` is not used in case of multi-stage sampling design (for example, because this information is not available), it is advisable to set `fh_zero = TRUE`. If `period` is `NULL`. A two-column matrix with rows for each stratum. The first column should contain stratum code. The second column - the number of primary sampling units in the population of each stratum. If `period` is not `NULL`. A three-column matrix with rows for each intersection of strata and period. The first column should contain period. The second column should contain stratum code. The third column - the number of primary sampling units in the population of each stratum and period.
`fh_zero`	by default FALSE; `fh` is calculated as division of n_h and N_h in each strata, if TRUE, `fh` value is zero in each strata.
`PSU_level`	by default TRUE; if PSU_level is TRUE, in each strata `fh` is calculated as division of count of PSU in sample (n_h) and count of PSU in frame (N_h). if PSU_level is FALSE, in each strata `fh` is calculated as division of count of units in sample (n_h) and count of units in frame (N_h), which calculated as sum of weights.
`PSU_sort`	optional; if PSU_sort is defined, then variance is calculated for systematic sample.
`period`	Optional variable for the survey periods. If supplied, the values for each period are computed independently. Object convertible to `data.table` or variable names as character, column numbers.
`dataset`	an optional name of the individual dataset `data.table`.
`msg`	an optional printed text, when function print error.
`checking`	Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE.

Details

If we assume that n_h \geq 2 for all h, that is, two or more PSUs are selected from each stratum, then the variance of \hat{\theta} can be estimated from the variation among the estimated PSU totals of the variable Z:

\hat{V} \left(\hat{\theta} \right)=\sum\limits_{h=1}^{H} \left(1-f_h \right) \frac{n_h}{n_{h}-1} \sum\limits_{i=1}^{n_h} \left( z_{hi\bullet}-\bar{z}_{h\bullet\bullet}\right)^2,

where \bullet z_{hi\bullet}=\sum\limits_{j=1}^{m_{hi}} \omega_{hij} z_{hij}

\bullet \bar{z}_{h\bullet\bullet}=\frac{\left( \sum\limits_{i=1}^{n_h} z_{hi\bullet} \right)}{n_h}

\bullet f_h is the sampling fraction of PSUs within stratum

\bullet h is the stratum number, with a total of H strata

\bullet i is the primary sampling unit (PSU) number within stratum h, with a total of n_h PSUs

\bullet j is the household number within cluster i of stratum h, with a total of m_{hi} household

\bullet w_{hij} is the sampling weight for household j in PSU i of stratum h

\bullet z_{hij} denotes the observed value of the analysis variable z for household j in PSU i of stratum h

Value

a data.table containing the values of the variance estimation by totals.

References

Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second onwards? 2012
Eurostat Methodologies and Working papers, Standard error estimation for the EU-SILC indicators of poverty and social exclusion, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.

Examples

Ys <- rchisq(10, 3)
w <- rep(2, 10)
PSU <- 1 : length(Ys)
H <- rep("Strata_1", 10)

# by default without using fh_zero (finite population correction)
variance_est(Y = Ys, H = H, PSU = PSU, w_final = w)


## Not run: 
 # without using fh_zero (finite population correction)
 variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = FALSE)
 
 # with using fh_zero (finite population correction)
 variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = TRUE)
 
## End(Not run)

[Package vardpoor version 0.20.1 Index]