DEFF {samplesize4surveys} | R Documentation |
Estimated sample Effects of Design (DEFF)
Description
This function returns the estimated design effects for a set of inclusion probabilities and the variables of interest.
Usage
DEFF(y, pik)
Arguments
y |
Vector, matrix or data frame containing the recollected information of the variables of interest for every unit in the selected sample. |
pik |
Vector of inclusion probabilities for each unit in the selected sample. |
Details
The design effect is somehow defined to be the ratio between the variance of a complex design and the variance of a simple design. When the design is stratified and the allocation is proportional, this measures reduces to
DEFF_{Kish} = 1 + CV(w)
where w is the set of weights (defined as the inverse of the inclusion probabilities) along the sample, and CV refers to the classical coefficient of variation. Although this measure is #' motivated by a stratified sampling design, it is commonly applied to any kind of survey where sampling weight are unequal. On the other hand, the Spencer's DEFF is motivated by the idea that a set of weights may be efficent even when they vary, and is defined by:
DEFF_{Spencer} = (1 - R^2) * DEFF_{Kish} + \frac{\hat{a}^2}{\hat{\sigma}^2_y} * (DEFF_{Kish} - 1)
where
\hat{\sigma}^2_y = \frac{\sum_s w_k (y_k - \bar{y}_w)^2}{\sum_s w_k}
and \hat{a}
is the estimation of the intercept in the following model
y_k = a + b * p_k + e_k
with p_k = \pi_k / n
is an standardized sampling weight. Finnaly, R^2
is the R-squared of this model.
Author(s)
Hugo Andres Gutierrez Rojas <hagutierrezro at gmail.com>
References
Gutierrez, H. A. (2009), Estrategias de muestreo: Diseno de encuestas y estimacion de parametros. Editorial Universidad Santo Tomas. Valliant, R, et. al. (2013), Practical tools for Design and Weighting Survey Samples. Springer
Examples
#############################
# Example with BigLucy data #
#############################
data(BigLucy)
attach(BigLucy)
# The sample size
n <- 400
res <- S.piPS(n, Income)
sam <- res[,1]
# The information about the units in the sample is stored in an object called data
data <- BigLucy[sam,]
attach(data)
names(data)
# Pik.s is the inclusion probability of every single unit in the selected sample
pik <- res[,2]
# The variables of interest are: Income, Employees and Taxes
# This information is stored in a data frame called estima
estima <- data.frame(Income, Employees, Taxes)
E.piPS(estima,pik)
DEFF(estima,pik)