| varDT {gustave} | R Documentation |
Variance approximation with Deville-Tillé (2005) formula
Description
varDT estimates the variance of the estimator of a total
in the case of a balanced sampling design with equal or unequal probabilities
using Deville-Tillé (2005) formula. Without balancing variables, it falls back
to Deville's (1993) classical approximation. Without balancing variables and
with equal probabilities, it falls back to the classical Horvitz-Thompson
variance estimator for the total in the case of simple random sampling.
Stratification is natively supported.
var_srs is a convenience wrapper for the (stratified) simple random
sampling case.
Usage
varDT(
y = NULL,
pik,
x = NULL,
strata = NULL,
w = NULL,
precalc = NULL,
id = NULL
)
var_srs(y, pik, strata = NULL, w = NULL, precalc = NULL)
Arguments
y |
A (sparse) numerical matrix of the variable(s) whose variance of their total is to be estimated. |
pik |
A numerical vector of first-order inclusion probabilities. |
x |
An optional (sparse) numerical matrix of balancing variable(s). |
strata |
An optional categorical vector (factor or character) when variance estimation is to be conducted within strata. |
w |
An optional numerical vector of row weights (see Details). |
precalc |
A list of pre-calculated results (see Details). |
id |
A vector of identifiers of the units used in the calculation.
Useful when |
Details
varDT aims at being the workhorse of most variance estimation conducted
with the gustave package. It may be used to estimate the variance
of the estimator of a total in the case of (stratified) simple random sampling,
(stratified) unequal probability sampling and (stratified) balanced sampling.
The native integration of stratification based on Matrix::TsparseMatrix allows
for significant performance gains compared to higher level vectorizations
(*apply especially).
Several time-consuming operations (e.g. collinearity-check, matrix
inversion) can be pre-calculated in order to speed up the estimation at
execution time. This is determined by the value of the parameters y
and precalc:
if
ynotNULLandprecalcNULL: on-the-fly calculation (no pre-calculation).if
yNULLandprecalcNULL: pre-calculation whose results are stored in a list of pre-calculated data.if
ynotNULLandprecalcnotNULL: calculation using the list of pre-calculated data.
w is a row weight used at the final summation step. It is useful
when varDT or var_srs are used on the second stage of a
two-stage sampling design applying the Rao (1975) formula.
Value
if
yis notNULL(calculation step) : the estimated variances as a numerical vector of size the number of columns of y.if
yisNULL(pre-calculation step) : a list containing pre-calculated data.
Difference with varest from package sampling
varDT differs from sampling::varest in several ways:
The formula implemented in
varDTis more general and encompasses balanced sampling.Even in its reduced form (without balancing variables), the formula implemented in
varDTslightly differs from the one implemented insampling::varest. Caron (1998, pp. 178-179) compares the two estimators (sampling::varestimplements V_2,varDTimplements V_1).-
varDTintroduces several optimizations:-
matrixwise operations allow to estimate variance on several interest variables at once
Matrix::TsparseMatrix capability and the native integration of stratification yield significant performance gains.
-
the ability to pre-calculate some time-consuming operations speeds up the estimation at execution time.
-
-
varDTdoes not natively implements the calibration estimator (i.e. the sampling variance estimator that takes into account the effect of calibration). In the context of thegustavepackage,res_calshould be called beforevarDTin order to achieve the same result.
Author(s)
Martin Chevalier
References
Caron N. (1998), "Le logiciel Poulpe : aspects méthodologiques", Actes des Journées de méthodologie statistique http://jms-insee.fr/jms1998s03_1/ Deville, J.-C. (1993), Estimation de la variance pour les enquêtes en deux phases, Manuscript, INSEE, Paris.
Deville, J.-C., Tillé, Y. (2005), "Variance approximation under balanced sampling", Journal of Statistical Planning and Inference, 128, issue 2 569-591
Rao, J.N.K (1975), "Unbiased variance estimation for multistage designs", Sankhya, C n°37
See Also
Examples
library(sampling)
set.seed(1)
# Simple random sampling case
N <- 1000
n <- 100
y <- rnorm(N)[as.logical(srswor(n, N))]
pik <- rep(n/N, n)
varDT(y, pik)
sampling::varest(y, pik = pik)
N^2 * (1 - n/N) * var(y) / n
# Unequal probability sampling case
N <- 1000
n <- 100
pik <- runif(N)
s <- as.logical(UPsystematic(pik))
y <- rnorm(N)[s]
pik <- pik[s]
varDT(y, pik)
varest(y, pik = pik)
# The small difference is expected (see Details).
# Balanced sampling case
N <- 1000
n <- 100
pik <- runif(N)
x <- matrix(rnorm(N*3), ncol = 3)
s <- as.logical(samplecube(x, pik))
y <- rnorm(N)[s]
pik <- pik[s]
x <- x[s, ]
varDT(y, pik, x)
# Balanced sampling case (variable of interest
# among the balancing variables)
N <- 1000
n <- 100
pik <- runif(N)
y <- rnorm(N)
x <- cbind(matrix(rnorm(N*3), ncol = 3), y)
s <- as.logical(samplecube(x, pik))
y <- y[s]
pik <- pik[s]
x <- x[s, ]
varDT(y, pik, x)
# As expected, the total of the variable of interest is perfectly estimated.
# strata argument
n <- 100
H <- 2
pik <- runif(n)
y <- rnorm(n)
strata <- letters[sample.int(H, n, replace = TRUE)]
all.equal(
varDT(y, pik, strata = strata),
varDT(y[strata == "a"], pik[strata == "a"]) + varDT(y[strata == "b"], pik[strata == "b"])
)
# precalc argument
n <- 1000
H <- 50
pik <- runif(n)
y <- rnorm(n)
strata <- sample.int(H, n, replace = TRUE)
precalc <- varDT(y = NULL, pik, strata = strata)
identical(
varDT(y, precalc = precalc),
varDT(y, pik, strata = strata)
)