R: Variance estimation for sample surveys in domain for one or...

vardomh {vardpoor}

R Documentation

Variance estimation for sample surveys in domain for one or two stage surveys by the ultimate cluster method

Description

Computes the variance estimation in domain for ID_level1.

Usage

vardomh(
  Y,
  H,
  PSU,
  w_final,
  ID_level1,
  ID_level2,
  Dom = NULL,
  period = NULL,
  N_h = NULL,
  PSU_sort = NULL,
  fh_zero = FALSE,
  PSU_level = TRUE,
  Z = NULL,
  dataset = NULL,
  X = NULL,
  periodX = NULL,
  X_ID_level1 = NULL,
  ind_gr = NULL,
  g = NULL,
  q = NULL,
  datasetX = NULL,
  confidence = 0.95,
  percentratio = 1,
  outp_lin = FALSE,
  outp_res = FALSE
)

Arguments

`Y`	Variables of interest. Object convertible to `data.table` or variable names as character, column numbers.
`H`	The unit stratum variable. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`PSU`	Primary sampling unit variable. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`w_final`	Weight variable. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`ID_level1`	Variable for level1 ID codes. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`ID_level2`	Variable for unit ID codes. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`Dom`	Optional variables used to define population domains. If supplied, values are calculated for each domain. An object convertible to `data.table` or variable names as character vector, column numbers.
`period`	Optional variable for the survey periods. If supplied, the values for each period are computed independently. Object convertible to `data.table` or variable names as character, column numbers.
`N_h`	Number of primary sampling units in population for each stratum (and period if `period` is not `NULL`). If `N_h = NULL` and `fh_zero = FALSE` (default), `N_h` is estimated from sample data as sum of weights (`w_final`) in each stratum (and period if `period` is not `NULL`) Optional for single-stage sampling design as it will be estimated from sample data. Recommended for multi-stage sampling design as `N_h` can not be correctly estimated from the sample data in this case. If `N_h` is not used in case of multi-stage sampling design (for example, because this information is not available), it is advisable to set `fh_zero = TRUE`. If `period` is `NULL`. A two-column data object convertible to `data.table` with rows for each stratum. The first column should contain stratum code. The second column - the number of primary sampling units in the population of each stratum. If `period` is not `NULL`. A three-column data object convertible to `data.table` with rows for each intersection of strata and period. The first column should contain period. The second column should contain stratum code. The third column - the number of primary sampling units in the population of each stratum and period.
`PSU_sort`	optional; if PSU_sort is defined, then variance is calculated for systematic sample.
`fh_zero`	by default FALSE; `fh` is calculated as division of n_h and N_h in each strata, if TRUE, `fh` value is zero in each strata.
`PSU_level`	by default TRUE; if PSU_level is TRUE, in each strata `fh` is calculated as division of count of PSU in sample (n_h) and count of PSU in frame (N_h). if PSU_level is FALSE, in each strata `fh` is calculated as division of count of units in sample (n_h) and count of units in frame (N_h), which calculated as sum of weights.
`Z`	Optional variables of denominator for ratio estimation. Object convertible to `data.table` or variable names as character, column numbers or logical vector (length of the vector has to be the same as the column count of `dataset`).
`dataset`	Optional survey data object convertible to `data.table`.
`X`	Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to `data.table` or variable names as character, column numbers.
`periodX`	Optional variable of the survey periods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to `data.table` or variable names as character, column numbers.
`X_ID_level1`	Variable for level1 ID codes. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`ind_gr`	Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`g`	Optional variable of the g weights. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`q`	Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`datasetX`	Optional survey data object in level1 convertible to `data.table`.
`confidence`	Optional positive value for confidence interval. This variable by default is 0.95.
`percentratio`	Positive numeric value. All linearized variables are multiplied with `percentratio` value, by default - 1.
`outp_lin`	Logical value. If `TRUE` linearized values of the ratio estimator will be printed out.
`outp_res`	Logical value. If `TRUE` estimated residuals of calibration will be printed out.

Details

Calculate variance estimation in domains for household surveys based on book of Hansen, Hurwitz and Madow.

Value

A list with objects are returned by the function:

lin_out A data.table containing the linearized values of the ratio estimator with ID_level2 and PSU.
res_out A data.table containing the estimated residuals of calibration with ID_level1 and PSU.
betas A numeric data.table containing the estimated coefficients of calibration.
all_result A data.table, which containing variables: variable - names of variables of interest,
Dom - optional variable of the population domains,
period - optional variable of the survey periods,
respondent_count - the count of respondents,
pop_size - the estimated size of population,
n_nonzero - the count of respondents, who answers are larger than zero,
estim - the estimated value,
var - the estimated variance,
se - the estimated standard error,
rse - the estimated relative standard error (coefficient of variation),
cv - the estimated relative standard error (coefficient of variation) in percentage,
absolute_margin_of_error - the estimated absolute margin of error,
relative_margin_of_error - the estimated relative margin of error in percentage,
CI_lower - the estimated confidence interval lower bound,
CI_upper - the estimated confidence interval upper bound,
confidence_level - the positive value for confidence interval,
S2_y_HT - the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using non-calibrated weights,
S2_y_ca - the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using calibrated weights,
S2_res - the estimated variance of the regression residuals,
S2_res - the estimated variance of the regression residuals,
var_srs_HT - the estimated variance of the HT estimator under SRS for household,
var_cur_HT - the estimated variance of the HT estimator under current design for household,
var_srs_ca - the estimated variance of the calibrated estimator under SRS for household,
deff_sam - the estimated design effect of sample design for household,
deff_est - the estimated design effect of estimator for household,
deff - the overall estimated design effect of sample design and estimator for household

References

Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second wave onwards? 2012
Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL http://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.

Examples

library("data.table")
library("laeken")
data("eusilc")
dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc)
aa <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
             w_final = "rb050", ID_level1 = "db030",
             ID_level2 = "rb030", Dom = "db040", period = NULL,
             N_h = NULL, Z = NULL, dataset = dataset1, X = NULL,
             X_ID_level1 = NULL, g = NULL, q = NULL, 
             datasetX = NULL, confidence = 0.95, percentratio = 1,
             outp_lin = TRUE, outp_res = TRUE)

## Not run: 
dataset2 <- copy(dataset1)
dataset1$period <- 1
dataset2$period <- 2
dataset1 <- data.table(rbind(dataset1, dataset2))

# by default without using fh_zero (finite population correction)
aa2 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
               w_final = "rb050", ID_level1 = "db030",
               ID_level2 = "rb030", Dom = "db040", period = "period",
               N_h = NULL, Z = NULL, dataset = dataset1,
               X = NULL, X_ID_level1 = NULL,  
               g = NULL, q = NULL, datasetX = NULL,
               confidence = .95, percentratio = 1,
               outp_lin = TRUE, outp_res = TRUE)
aa2

# without using fh_zero (finite population correction)
aa3 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
               w_final = "rb050", ID_level1 = "db030", 
               ID_level2 = "rb030", Dom = "db040",
               period = "period", N_h = NULL, fh_zero = FALSE, 
               Z = NULL, dataset = dataset1, X = NULL,
               X_ID_level1 = NULL, g = NULL, q = NULL,
               datasetX = NULL, confidence = .95,
               percentratio = 1, outp_lin = TRUE,
               outp_res = TRUE)
aa3

# with using fh_zero (finite population correction)
aa4 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
               w_final = "rb050", ID_level1 = "db030",
               ID_level2 = "rb030", Dom = "db040",
               period = "period", N_h = NULL, fh_zero = TRUE, 
               Z = NULL, dataset = dataset1,
               X = NULL, X_ID_level1 = NULL, 
               g = NULL, q = NULL, datasetX = NULL,
               confidence = .95, percentratio = 1,
               outp_lin = TRUE, outp_res = TRUE)
aa4
## End(Not run)

[Package vardpoor version 0.20.1 Index]