variance_est {vardpoor}R Documentation

Variance estimation for sample surveys by the ultimate cluster method

Description

Computes the variance estimation by the ultimate cluster method.

Usage

variance_est(
  Y,
  H,
  PSU,
  w_final,
  N_h = NULL,
  fh_zero = FALSE,
  PSU_level = TRUE,
  PSU_sort = NULL,
  period = NULL,
  dataset = NULL,
  msg = "",
  checking = TRUE
)

Arguments

Y

Variables of interest. Object convertible to data.table or variable names as character, column numbers.

H

The unit stratum variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

PSU

Primary sampling unit variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

w_final

Weight variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

N_h

Number of primary sampling units in population for each stratum (and period if period is not NULL). If N_h = NULL and fh_zero = FALSE (default), N_h is estimated from sample data as sum of weights (w_final) in each stratum (and period if period is not NULL). Optional for single-stage sampling design as it will be estimated from sample data. Recommended for multi-stage sampling design as N_h can not be correctly estimated from the sample data in this case. If N_h is not used in case of multi-stage sampling design (for example, because this information is not available), it is advisable to set fh_zero = TRUE. If period is NULL. A two-column matrix with rows for each stratum. The first column should contain stratum code. The second column - the number of primary sampling units in the population of each stratum. If period is not NULL. A three-column matrix with rows for each intersection of strata and period. The first column should contain period. The second column should contain stratum code. The third column - the number of primary sampling units in the population of each stratum and period.

fh_zero

by default FALSE; fh is calculated as division of n_h and N_h in each strata, if TRUE, fh value is zero in each strata.

PSU_level

by default TRUE; if PSU_level is TRUE, in each strata fh is calculated as division of count of PSU in sample (n_h) and count of PSU in frame (N_h). if PSU_level is FALSE, in each strata fh is calculated as division of count of units in sample (n_h) and count of units in frame (N_h), which calculated as sum of weights.

PSU_sort

optional; if PSU_sort is defined, then variance is calculated for systematic sample.

period

Optional variable for the survey periods. If supplied, the values for each period are computed independently. Object convertible to data.table or variable names as character, column numbers.

dataset

an optional name of the individual dataset data.table.

msg

an optional printed text, when function print error.

checking

Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE.

Details

If we assume that nh2n_h \geq 2 for all hh, that is, two or more PSUs are selected from each stratum, then the variance of θ^\hat{\theta} can be estimated from the variation among the estimated PSU totals of the variable ZZ:

V^(θ^)=h=1H(1fh)nhnh1i=1nh(zhizˉh)2,\hat{V} \left(\hat{\theta} \right)=\sum\limits_{h=1}^{H} \left(1-f_h \right) \frac{n_h}{n_{h}-1} \sum\limits_{i=1}^{n_h} \left( z_{hi\bullet}-\bar{z}_{h\bullet\bullet}\right)^2,

where \bullet zhi=j=1mhiωhijzhijz_{hi\bullet}=\sum\limits_{j=1}^{m_{hi}} \omega_{hij} z_{hij}

\bullet zˉh=(i=1nhzhi)nh\bar{z}_{h\bullet\bullet}=\frac{\left( \sum\limits_{i=1}^{n_h} z_{hi\bullet} \right)}{n_h}

\bullet fhf_h is the sampling fraction of PSUs within stratum

\bullet hh is the stratum number, with a total of H strata

\bullet ii is the primary sampling unit (PSU) number within stratum hh, with a total of nhn_h PSUs

\bullet jj is the household number within cluster ii of stratum hh, with a total of mhim_{hi} household

\bullet whijw_{hij} is the sampling weight for household jj in PSU ii of stratum hh

\bullet zhijz_{hij} denotes the observed value of the analysis variable zz for household jj in PSU ii of stratum hh

Value

a data.table containing the values of the variance estimation by totals.

References

Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second onwards? 2012
Eurostat Methodologies and Working papers, Standard error estimation for the EU-SILC indicators of poverty and social exclusion, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.

See Also

domain, lin.ratio, linarpr, linarpt, lingini, lingini2, lingpg, linpoormed, linqsr, linrmpg, residual_est, vardom, vardomh, varpoord, variance_othstr

Examples

Ys <- rchisq(10, 3)
w <- rep(2, 10)
PSU <- 1 : length(Ys)
H <- rep("Strata_1", 10)

# by default without using fh_zero (finite population correction)
variance_est(Y = Ys, H = H, PSU = PSU, w_final = w)


## Not run: 
 # without using fh_zero (finite population correction)
 variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = FALSE)
 
 # with using fh_zero (finite population correction)
 variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = TRUE)
 
## End(Not run)


[Package vardpoor version 0.20.1 Index]