RRTCS-package {RRTCS}R Documentation

Randomized Response Techniques for Complex Surveys

Description

The aim of this package is to calculate point and interval estimation for linear parameters with data obtained from randomized response surveys. Twenty one RR methods are implemented for complex surveys:

- Randomized response procedures to estimate parameters of a qualitative stigmatizing characteristic: Christofides model, Devore model, Forced-Response model, Horvitz model, Horvitz model with unknown B, Kuk model, Mangat model, Mangat model with unknown B, Mangat-Singh model, Mangat-Singh-Singh model, Mangat-Singh-Singh model with unknown B, Singh-Joarder model, SoberanisCruz model and Warner model.

- Randomized response procedures to estimate parameters of a quantitative stigmatizing characteristic: BarLev model, Chaudhuri-Christofides model, Diana-Perri-1 model, Diana-Perri-2 model, Eichhorn-Hayre model, Eriksson model and Saha model.

Using the usual notation in survey sampling, we consider a finite population U={1,,i,,N}U=\{1,\ldots,i,\ldots,N\}, consisting of NN different elements. Let yiy_i be the value of the sensitive aspect under study for the iith population element. Our aim is to estimate the finite population total Y=i=1NyiY=\sum_{i=1}^N y_i of the variable of interest yy or the population mean Yˉ=1Ni=1Nyi\bar{Y}=\frac{1}{N}\sum_{i=1}^N y_i. If we can estimate the proportion of the population presenting a certain stigmatized behaviour AA, the variable yiy_i takes the value 1 if iGAi\in G_A (the group with the stigmatized behaviour) and the value zero otherwise. Some qualitative models use an innocuous or related attribute BB whose population proportion can be known or unknown.

Assume that a sample ss is chosen according to a general design pp with inclusion probabilities πi=sip(s),iU\pi_i=\sum_{s\ni i}p(s),i\in U.

In order to include a wide variety of RR procedures, we consider the unified approach given by Arnab (1994). The interviews of individuals in the sample ss are conducted in accordance with the RR model. For each isi\in s the RR induces a random response ziz_i (denoted scrambled response) so that the revised randomized response rir_i (Chaudhuri and Christofides, 2013) is an unbiased estimation of yiy_i. Then, an unbiased estimator for the population total of the sensitive characteristic yy is given by

Y^R=isriπi\widehat{Y}_R=\sum_{i\in s}\frac{r_i}{\pi_i}

The variance of this estimator is given by:

V(Y^R)=iUVR(ri)πi+VHT(r)V(\widehat{Y}_R)=\sum_{i\in U}\frac{V_R(r_i)}{\pi_i}+V_{HT}(r)

where VR(ri)V_R(r_i) is the variance of rir_i under the randomized device and VHT(r)V_{HT}(r) is the design-variance of the Horvitz Thompson estimator of rir_i values.

This variance is estimated by:

V^(Y^R)=isV^R(ri)πi+V^(r)\widehat{V}(\widehat{Y}_R)=\sum_{i\in s}\frac{\widehat{V}_R(r_i)}{\pi_i}+\widehat{V}(r)

where V^R(ri)\widehat{V}_R(r_i) varies with the RR device and the estimation of the design-variance, V^(r)\widehat{V}(r), is obtained using Deville's method (Deville, 1993).

The confidence interval at (1α)(1-\alpha) % level is given by

ci=(Y^Rz1α2V^(Y^R),Y^R+z1α2V^(Y^R))ci=\left(\widehat{Y}_R-z_{1-\frac{\alpha}{2}}\sqrt{\widehat{V}(\widehat{Y}_R)},\widehat{Y}_R+z_{1-\frac{\alpha}{2}}\sqrt{\widehat{V}(\widehat{Y}_R)}\right)

where z1α2z_{1-\frac{\alpha}{2}} denotes the (1α)(1-\alpha) % quantile of a standard normal distribution.

Similarly, an unbiased estimator for the population mean Yˉ\bar{Y} is given by

Yˉ^R=1Nisriπi\widehat{\bar{Y}}_R= \frac{1}{N}\sum_{i\in s}\frac{r_i}{\pi_i}

and an unbiased estimator for its variance is calculated as:

V^(Yˉ^R)=1N2(isV^R(ri)πi+V^(r))\widehat{V}(\widehat{\bar{Y}}_R)=\frac{1}{N^2}\left(\sum_{i\in s}\frac{\widehat{V}_R(r_i)}{\pi_i}+\widehat{V}(r)\right)

In cases where the population size NN is unknown, we consider Hàjek-type estimators for the mean:

Yˉ^RH=isriis1πi\widehat{\bar{Y}}_{RH}=\frac{\sum_{i\in s}r_i}{\sum_{i\in s}\frac{1}{\pi_i}}

and Taylor-series linearization variance estimation of the ratio (Wolter, 2007) is used.

In qualitative models, the values rir_i and V^R(ri)\widehat{V}_R(r_i) for isi\in s are described in each model.

In some quantitative models, the values rir_i and V^R(ri)\widehat{V}_R(r_i) for isi\in s are calculated in a general form (Arcos et al, 2015) as follows:

The randomized response given by the person ii is

zi={yiwith probability p1yiS1+S2with probability p2S3with probability p3z_i=\left\{\begin{array}{lccc} y_i & \textrm{with probability } p_1\\ y_iS_1+S_2 & \textrm{with probability } p_2\\ S_3 & \textrm{with probability } p_3 \end{array} \right.

with p1+p2+p3=1p_1+p_2+p_3=1 and where S1,S2S_1,S_2 and S3S_3 are scramble variables whose distributions are assumed to be known. We denote by μi\mu_i and σi\sigma_i respectively the mean and standard deviation of the variable Si,(i=1,2,3)S_i,(i=1,2,3).

The transformed variable is

ri=zip2μ2p3μ3p1+p2μ1,r_i=\frac{z_i-p_2\mu_2-p_3\mu_3}{p_1+p_2\mu_1},

its variance is

VR(ri)=1(p1+p2μ1)2(yi2A+yiB+C)V_R(r_i)=\frac{1}{(p_1+p_2\mu_1)^2}(y_i^2A+y_iB+C)

where

A=p1(1p1)+σ12p2+μ12p2μ12p222p1p2μ1A=p_1(1-p_1)+\sigma_1^2p_2+\mu_1^2p_2-\mu_1^2p_2^2-2p_1p_2\mu_1

B=2p2μ1μ22μ1μ2p222p1p2μ22μ3p1p32μ1μ3p2p3B=2p_2\mu_1\mu_2-2\mu_1\mu_2p_2^2-2p_1p_2\mu_2-2\mu_3p_1p_3-2\mu_1\mu_3p_2p_3

C=(σ22+μ22)p2+(σ32+μ32)p3(μ2p2+μ3p3)2C=(\sigma_2^2+\mu_2^2)p_2+(\sigma_3^2+\mu_3^2)p_3-(\mu_2p_2+\mu_3p_3)^2

and the estimated variance is

V^R(ri)=1(p1+p2μ1)2(ri2A+riB+C).\widehat{V}_R(r_i)=\frac{1}{(p_1+p_2\mu_1)^2}(r_i^2A+r_iB+C).

Some of the quantitative techniques considered can be viewed as particular cases of the above described procedure. Other models are described in the respective function.

Alternatively, the variance can be estimated using certain resampling methods.

Author(s)

Beatriz Cobo Rodríguez, Department of Statistics and Operations Research. University of Granada beacr@ugr.es

María del Mar Rueda García, Department of Statistics and Operations Research. University of Granada mrueda@ugr.es

Antonio Arcos Cebrián, Department of Statistics and Operations Research. University of Granada arcos@ugr.es

Maintainer: Beatriz Cobo Rodríguez beacr@ugr.es

References

Arcos, A., Rueda, M., Singh, S. (2015). A generalized approach to randomised response for quantitative variables. Quality and Quantity 49, 1239-1256.

Arnab, R. (1994). Non-negative variance estimator in randomized response surveys. Comm. Stat. Theo. Math. 23, 1743-1752.

Chaudhuri, A., Christofides, T.C. (2013). Indirect Questioning in Sample Surveys Springer-Verlag Berlin Heidelberg.

Deville, J.C. (1993). Estimation de la variance pour les enquêtes en deux phases. Manuscript, INSEE, Paris.

Wolter, K.M. (2007). Introduction to Variance Estimation. 2nd Edition. Springer.


[Package RRTCS version 0.0.4 Index]