R: Expected variance of the general regression estimator

expgreg {optimStrat}

R Documentation

Expected variance of the general regression estimator

Description

Compute the expected design variance of the general regression estimator of the total of a study variable under different sampling designs.

Usage

expgreg(x, b11, b12, b21, b22, d12, Rfy, n, design = NULL, 
        stratum = NULL, x_des = NULL, inc.p = NULL, ...)

Arguments

`x`	design matrix with the variables to be used into the GREG estimator.
`b11`	a numeric vector of length equal to the number of variables in `x` giving the coefficients of the trend term in the true superpopulation model (see ‘Details’).
`b12`	a numeric vector of length equal to the number of variables in `x` giving the exponents of the trend term in the true superpopulation model (see ‘Details’).
`b21`	a numeric vector of length equal to the number of variables in `x` giving the coefficients of the spread term in the true superpopulation model (see ‘Details’).
`b22`	a numeric vector of length equal to the number of variables in `x` giving the exponents of the spread term in the true superpopulation model (see ‘Details’).
`d12`	a numeric vector of length equal to the number of variables in `x` giving the exponents of the trend term in the assumed superpopulation model (see ‘Details’).
`Rfy`	a number giving the square root of the coefficient of determination between the auxiliary variables and the study varible.
`n`	either a positive number indicating the (expected) sample size (when `design` is one of 'srs', 'poi', 'pips' or `NULL`) or a numeric vector indicating the sample size of the strata to which each element belongs (when `design` is 'stsi') (see ‘Examples’).
`design`	a character string giving the sampling design. It must be one of 'srs' (simple random sampling without replacement), 'poi' (Poisson sampling), 'stsi' (stratified simple random sampling), 'pips' (Pareto `\pi`ps sampling) or `NULL` (see ‘Details’).
`stratum`	a vector indicating the stratum to which every unit belongs. Only used if `design` is 'stsi'.
`x_des`	a positive numeric vector giving the values of the auxiliary variable that is used for defining the inclusion probabilities. Only used if `design` is 'poi' or 'pips'.
`inc.p`	a matrix giving the first and second order inclusion probabilities. Only used if `design` is `NULL`.
`...`	other arguments passed to `lm` (see ‘Details’).

Details

The expected variance of the general regression estimator under different sampling designs is computed.

It is assumed that the underlying superpopulation model is of the form

Y_{k} = f(x_{k}|\delta_{1}) + \epsilon_{k}

with E\epsilon_{k}=0, V\epsilon_{k}= \sigma_{0}^{2}g^{2}(x_{k}|\delta_{2}) and Cov(\epsilon_{k},\epsilon_{l})=0.

But the true generating model is in fact of the form

Y_{k} = f(x_{k}|\beta_{1}) + \epsilon_{k}

with E\epsilon_{k}=0, V\epsilon_{k}= \sigma^{2}g^{2}(x_{k}|\beta_{2}) and Cov(\epsilon_{k},\epsilon_{l})=0.

Where

f(x_{k}|\delta_{1}) = \sum_{j=1}^{J}\delta_{1,j}x_{jk}^{\delta_{1,J+j}},

g(x_{k}|\delta_{2}) = \sum_{j=1}^{J}\delta_{2,j}x_{jk}^{\delta_{2,J+j}},

f(x_{k}|\beta_{1}) = \sum_{j=1}^{J}\beta_{1,j}x_{jk}^{\beta_{1,J+j}},

g(x_{k}|\beta_{2}) = \sum_{j=1}^{J}\beta_{2,j}x_{jk}^{\beta_{2,J+j}}.

the coefficients \beta_{1,j} (j=1,\cdots,J) are given by b11;
the exponents \beta_{1,j} (j=J+1,\cdots,2J) are given by b12;
the coefficients \beta_{2,j} (j=1,\cdots,J) are given by b21;
the exponents \beta_{2,j} (j=J+1,\cdots,2J) are given by b22;
the exponents \delta_{1,j} (j=J+1,\cdots,2J) are given by d12.

The expected variance of the GREG estimator is approximated by

E\left(V\left(\hat{t}\right)\right) = V\left(\hat{t}_{z}\right) + \hat{\sigma}^{2}\sum_{k=1}^{N}\left(\frac{1}{\pi_{k}}-1\right)g^{2}(x_{k}|\beta_{2})

where

V\left(\hat{t}_{z}\right) = \sum_{k=1}^{N}\sum_{l=1}^{N}\pi_{kl}\frac{z_{k}}{\pi_{k}}\frac{z_{l}}{\pi_{l}} - \left(\sum_{k=1}^{N}z_{k}\right)^{2}

and

\hat{\sigma}^{2} = \frac{S^{2}_{f}}{\bar{g^{2}}}\left(\frac{1}{R^{2}_{fy}}-1\right),

z_{k} = \left(x_{k}^{\beta}-x_{k}^{\delta}A\right)\beta_{1}^{**},

S^{2}_{f} = \sum_{k=1}^{N}(f(x_{k}|\beta_{1})-\bar{f})^{2}/N,

\bar{g^{2}} = \sum_{k=1}^{N}g(x_{k}|\beta_{2})^{2}/N,

x_{k}^{\beta} = \left(x_{1k}^{\beta_{1,J+1}},\cdots,x_{Jk}^{\beta_{1,2J}}\right),

x_{k}^{\delta} = \left(x_{1k}^{\delta_{1,J+1}},\cdots,x_{Jk}^{\delta_{1,2J}}\right),

\beta_{1}^{**} = (\beta_{1,1},\cdots,\beta_{1,J})',

A = \left(\sum_{k=1}^{N}w_{k}x_{k}^{\delta'}x_{k}^{\delta}\right)^{-1}\sum_{k=1}^{N}w_{k}x_{k}^{\delta'}x_{k}^{\beta}.

N is the population size and \pi_{k} and \pi_{kl} are, respectively, the first and second order inclusion probabilities. w_{k} is a weight associated to each element and it represents the inverse of the conditional variance (up to a scalar) of the underlying superpopulation model (see ‘Examples’).

If design=NULL, the matrix of inclusion probabilities is obtained proportional to the matrix p.inc. If design is other than NULL, the formula for the variance is simplified in such a way that the inclusion probabilities matrix is no longer necessary. In particular:

if design='srs', only the sample size n is required;
if design='stsi', both the stratum ID stratum and the sample size per stratum n, are required;
if design is either 'pips' or 'poi', the inclusion probabilities are obtained proportional to the values of x_des, corrected if necessary.

Value

A numeric value giving the expected variance of the general regression estimator for the desired design under the working and true models.

References

Bueno, E. (2018). A Comparison of Stratified Simple Random Sampling and Probability Proportional-to-size Sampling. Research Report, Department of Statistics, Stockholm University 2018:6. http://gauss.stat.su.se/rr/RR2018_6.pdf.

Examples

x1<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x2<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x3<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x<- cbind(x1,x2,x3)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x3)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2,weights=1/x1)

st1<- optiallo(n=150,x=x3,H=6)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,0,1),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,0,1),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum,weights=1/x1)

[Package optimStrat version 2.4 Index]