expgreg {optimStrat} | R Documentation |
Expected variance of the general regression estimator
Description
Compute the expected design variance of the general regression estimator of the total of a study variable under different sampling designs.
Usage
expgreg(x, b11, b12, b21, b22, d12, Rfy, n, design = NULL,
stratum = NULL, x_des = NULL, inc.p = NULL, ...)
Arguments
x |
design matrix with the variables to be used into the GREG estimator. |
b11 |
a numeric vector of length equal to the number of variables in |
b12 |
a numeric vector of length equal to the number of variables in |
b21 |
a numeric vector of length equal to the number of variables in |
b22 |
a numeric vector of length equal to the number of variables in |
d12 |
a numeric vector of length equal to the number of variables in |
Rfy |
a number giving the square root of the coefficient of determination between the auxiliary variables and the study varible. |
n |
either a positive number indicating the (expected) sample size (when |
design |
a character string giving the sampling design. It must be one of 'srs' (simple random sampling without replacement), 'poi' (Poisson sampling), 'stsi' (stratified simple random sampling), 'pips' (Pareto |
stratum |
a vector indicating the stratum to which every unit belongs. Only used if |
x_des |
a positive numeric vector giving the values of the auxiliary variable that is used for defining the inclusion probabilities. Only used if |
inc.p |
a matrix giving the first and second order inclusion probabilities. Only used if |
... |
other arguments passed to |
Details
The expected variance of the general regression estimator under different sampling designs is computed.
It is assumed that the underlying superpopulation model is of the form
Y_{k} = f(x_{k}|\delta_{1}) + \epsilon_{k}
with E\epsilon_{k}=0
, V\epsilon_{k}= \sigma_{0}^{2}g^{2}(x_{k}|\delta_{2})
and Cov(\epsilon_{k},\epsilon_{l})=0
.
But the true generating model is in fact of the form
Y_{k} = f(x_{k}|\beta_{1}) + \epsilon_{k}
with E\epsilon_{k}=0
, V\epsilon_{k}= \sigma^{2}g^{2}(x_{k}|\beta_{2})
and Cov(\epsilon_{k},\epsilon_{l})=0
.
Where
f(x_{k}|\delta_{1}) = \sum_{j=1}^{J}\delta_{1,j}x_{jk}^{\delta_{1,J+j}},
g(x_{k}|\delta_{2}) = \sum_{j=1}^{J}\delta_{2,j}x_{jk}^{\delta_{2,J+j}},
f(x_{k}|\beta_{1}) = \sum_{j=1}^{J}\beta_{1,j}x_{jk}^{\beta_{1,J+j}},
g(x_{k}|\beta_{2}) = \sum_{j=1}^{J}\beta_{2,j}x_{jk}^{\beta_{2,J+j}}.
the coefficients
\beta_{1,j}
(j=1,\cdots,J
) are given byb11
;the exponents
\beta_{1,j}
(j=J+1,\cdots,2J
) are given byb12
;the coefficients
\beta_{2,j}
(j=1,\cdots,J
) are given byb21
;the exponents
\beta_{2,j}
(j=J+1,\cdots,2J
) are given byb22
;the exponents
\delta_{1,j}
(j=J+1,\cdots,2J
) are given byd12
.
The expected variance of the GREG estimator is approximated by
E\left(V\left(\hat{t}\right)\right) = V\left(\hat{t}_{z}\right) + \hat{\sigma}^{2}\sum_{k=1}^{N}\left(\frac{1}{\pi_{k}}-1\right)g^{2}(x_{k}|\beta_{2})
where
V\left(\hat{t}_{z}\right) = \sum_{k=1}^{N}\sum_{l=1}^{N}\pi_{kl}\frac{z_{k}}{\pi_{k}}\frac{z_{l}}{\pi_{l}} - \left(\sum_{k=1}^{N}z_{k}\right)^{2}
and
\hat{\sigma}^{2} = \frac{S^{2}_{f}}{\bar{g^{2}}}\left(\frac{1}{R^{2}_{fy}}-1\right),
z_{k} = \left(x_{k}^{\beta}-x_{k}^{\delta}A\right)\beta_{1}^{**},
S^{2}_{f} = \sum_{k=1}^{N}(f(x_{k}|\beta_{1})-\bar{f})^{2}/N,
\bar{g^{2}} = \sum_{k=1}^{N}g(x_{k}|\beta_{2})^{2}/N,
x_{k}^{\beta} = \left(x_{1k}^{\beta_{1,J+1}},\cdots,x_{Jk}^{\beta_{1,2J}}\right),
x_{k}^{\delta} = \left(x_{1k}^{\delta_{1,J+1}},\cdots,x_{Jk}^{\delta_{1,2J}}\right),
\beta_{1}^{**} = (\beta_{1,1},\cdots,\beta_{1,J})',
A = \left(\sum_{k=1}^{N}w_{k}x_{k}^{\delta'}x_{k}^{\delta}\right)^{-1}\sum_{k=1}^{N}w_{k}x_{k}^{\delta'}x_{k}^{\beta}.
N
is the population size and \pi_{k}
and \pi_{kl}
are, respectively, the first and second order inclusion probabilities. w_{k}
is a weight associated to each element and it represents the inverse of the conditional variance (up to a scalar) of the underlying superpopulation model (see ‘Examples’).
If design=NULL
, the matrix of inclusion probabilities is obtained proportional to the matrix p.inc
. If design
is other than NULL
, the formula for the variance is simplified in such a way that the inclusion probabilities matrix is no longer necessary. In particular:
if
design='srs'
, only the sample sizen
is required;if
design='stsi'
, both the stratum IDstratum
and the sample size per stratumn
, are required;if
design
is either'pips'
or'poi'
, the inclusion probabilities are obtained proportional to the values ofx_des
, corrected if necessary.
Value
A numeric value giving the expected variance of the general regression estimator for the desired design under the working and true models.
References
Bueno, E. (2018). A Comparison of Stratified Simple Random Sampling and Probability Proportional-to-size Sampling. Research Report, Department of Statistics, Stockholm University 2018:6. http://gauss.stat.su.se/rr/RR2018_6.pdf.
See Also
expvar
for the simultaneous calculation of the expected variance of five sampling strategies under a superpopulation model; vargreg
for the variance of the GREG estimator; desvar
for the simultaneous calculation of the variance of six sampling strategies; optimApp
for an interactive application of expgreg
.
Examples
x1<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x2<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x3<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x<- cbind(x1,x2,x3)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x3)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2,weights=1/x1)
st1<- optiallo(n=150,x=x3,H=6)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
d12=c(1,1,0),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
d12=c(1,0,1),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
d12=c(1,0,1),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum,weights=1/x1)