define_variance_wrapper {gustave} | R Documentation |
Define a variance estimation wrapper
Description
Given a variance estimation function (specific to a
survey), define_variance_wrapper
defines a variance estimation
wrapper easier to use (e.g. automatic domain estimation,
linearization).
Usage
define_variance_wrapper(
variance_function,
reference_id,
reference_weight,
default_id = NULL,
technical_data = NULL,
technical_param = NULL,
objects_to_include = NULL
)
Arguments
variance_function |
An R function. It is the methodological workhorse of the variance estimation: from a set of arguments including the variables of interest (see below), it should return a vector of estimated variances. See Details. |
reference_id |
A vector containing the ids of all the responding units
of the survey. It can also be an unevaluated expression (enclosed in
|
reference_weight |
A vector containing the reference weight of the survey.
It can also be an unevaluated expression (enclosed in |
default_id |
A character vector of length 1, the name of the default
identifying variable in the survey file. It can also be an unevaluated
expression (enclosed in |
technical_data |
A named list of technical data needed to perform
the variance estimation (e.g. sampling strata, first- or second-order
probabilities of inclusion, estimated response probabilities, calibration
variables). Its names should match the names of the corresponding arguments
in |
technical_param |
A named list of technical parameters used to control
some aspect of the variance estimation process (e.g. alternative methodology).
Its names should match the names of the corresponding arguments in |
objects_to_include |
(Advanced use) A character vector indicating the name of additional R objects to include within the variance wrapper. |
Details
Defining variance estimation wrappers is the key feature of
the gustave
package. It is the workhorse of the ready-to-use
qvar
function and should be used directly to handle more complex
cases (e.g. surveys with several stages or balanced sampling).
Analytical variance estimation is often difficult to carry out by non-specialists owing to the complexity of the underlying sampling and estimation methodology. This complexity yields complex variance estimation functions which are most often only used by the sampling expert who actually wrote them. A variance estimation wrapper is an intermediate function that is "wrapped around" the (complex) variance estimation function in order to provide the non-specialist with user-friendly features (see examples):
calculation of complex statistics (see
standard statistic wrappers
)domain estimation
handy evaluation and factor discretization
define_variance_wrapper
allows the sampling expert to define a
variance estimation wrapper around a given variance estimation function and
set its default parameters. The produced variance estimation wrapper is
standalone in the sense that it contains all technical data necessary
to carry out the estimation (see technical_data
).
The arguments of the variance_function
fall into three types:
the data argument (mandatory, only one allowed): the numerical matrix of variables of interest to apply the variance estimation formula on
technical data arguments (optional, one or more allowed): technical and methodological information used by the variance estimation function (e.g. sampling strata, first- or second-order probabilities of inclusion, estimated response probabilities, calibration variables)
technical parameters (optional, one or more allowed): non-data arguments to be used to control some aspect of the variance estimation (e.g. alternative methodology)
technical_data
and technical_param
are used to determine
which arguments of variance_function
relate to technical information,
the only remaining argument is considered as the data argument.
Value
An R function that makes the estimation of variance based on the provided variance function easier. Its parameters are:
-
data
: one or more calls to a statistic wrapper (e.g.total()
,mean()
,ratio()
). See examples andstandard statistic wrappers
) andstandard statistic wrappers
) -
where
: a logical vector indicating a domain on which the variance estimation is to be performed -
by
: q qualitative variable whose levels are used to define domains on which the variance estimation is performed -
alpha
: a numeric vector of length 1 indicating the threshold for confidence interval derivation (0.05
by default) -
display
: a logical verctor of length 1 indicating whether the result of the estimation should be displayed or not -
id
: a character vector of size 1 containing the name of the identifying variable in the survey file. Its default value depends on the value ofdefault_id
indefine_variance_wrapper
-
envir
: an environment containing a binding todata
Author(s)
Martin Chevalier
References
Rao, J.N.K (1975), "Unbiased variance estimation for multistage designs", Sankhya, C n°37
See Also
qvar
, standard statistic wrappers
, varDT
Examples
### Example from the Labour force survey (LFS)
# The (simulated) Labour force survey (LFS) has the following characteristics:
# - first sampling stage: balanced sampling of 4 areas (each corresponding to
# about 120 dwellings) on first-order probability of inclusion (proportional to
# the number of dwellings in the area) and total annual income in the area.
# - second sampling stage: in each sampled area, simple random sampling of 20
# dwellings
# - neither non-response nor calibration
# As this is a multi-stage sampling design with balanced sampling at the first
# stage, the qvar function does not apply. A variance wrapper can nonetheless
# be defined using the core define_variance_wrapper function.
# Step 1 : Definition of the variance function and the corresponding technical data
# In this context, the variance estimation function specific to the LFS
# survey can be defined as follows:
var_lfs <- function(y, ind, dwel, area){
variance <- list()
# Variance associated with the sampling of the dwellings
y <- sum_by(y, ind$id_dwel)
variance[["dwel"]] <- var_srs(
y = y, pik = dwel$pik_dwel, strata = dwel$id_area,
w = (1 / dwel$pik_area^2 - dwel$q_area)
)
# Variance associated with the sampling of the areas
y <- sum_by(y = y, by = dwel$id_area, w = 1 / dwel$pik_dwel)
variance[["area"]] <- varDT(y = y, precalc = area)
Reduce(`+`, variance)
}
# where y is the matrix of variables of interest and ind, dwel and area the technical data:
technical_data_lfs <- list()
# Technical data at the area level
# The varDT function allows for the pre-calculation of
# most of the methodological quantities needed.
technical_data_lfs$area <- varDT(
y = NULL,
pik = lfs_samp_area$pik_area,
x = as.matrix(lfs_samp_area[c("pik_area", "income")]),
id = lfs_samp_area$id_area
)
# Technical data at the dwelling level
# In order to implement Rao (1975) formula for two-stage samples,
# we associate each dwelling with the diagonal term corresponding
# to its area in the first-stage variance estimator:
lfs_samp_dwel$q_area <- with(technical_data_lfs$area, setNames(diago, id))[lfs_samp_dwel$id_area]
technical_data_lfs$dwel <- lfs_samp_dwel[c("id_dwel", "pik_dwel", "id_area", "pik_area", "q_area")]
# Technical data at the individual level
technical_data_lfs$ind <- lfs_samp_ind[c("id_ind", "id_dwel", "sampling_weight")]
# Test of the variance function var_lfs
y <- matrix(as.numeric(lfs_samp_ind$unemp), ncol = 1, dimnames = list(lfs_samp_ind$id_ind))
with(technical_data_lfs, var_lfs(y = y, ind = ind, dwel = dwel, area = area))
# Step 2 : Definition of the variance wrapper
# Call of define_variance_wrapper
precision_lfs <- define_variance_wrapper(
variance_function = var_lfs,
technical_data = technical_data_lfs,
reference_id = technical_data_lfs$ind$id_ind,
reference_weight = technical_data_lfs$ind$sampling_weight,
default_id = "id_ind"
)
# Test
precision_lfs(lfs_samp_ind, unemp)
# The variance wrapper precision_lfs has the same features
# as variance wrappers produced by the qvar function (see
# qvar examples for more details).