R: Compute small area estimates based on the basic unit-level...

fSAE.Unit {hbsae}

R Documentation

Compute small area estimates based on the basic unit-level model.

Description

This is the function that carries out most of the computational work. It computes small area estimates based on the basic unit-level model, also known as the Battese-Harter-Fuller model, although it is also called by fSurvReg and fSAE.Area to compute survey regression or area-level model small area estimates. By default, Hierarchical Bayes estimates are computed, using fast one-dimensional numerical integration to average over the posterior density for the ratio of between and within area variance. This way, the small area estimates and MSEs account for the uncertainty about this parameter. Besides hierarchical Bayes, REML and hybrid methods are supported. These methods use the REML estimate or posterior mean of the variance ratio, respectively, as a plug-in estimate. Both methods do not account for uncertainty about this parameter. Synthetic estimates are computed by setting the variance ratio to zero.

Usage

fSAE.Unit(
  y,
  X,
  area,
  Narea = NULL,
  Xpop = NULL,
  fpc = TRUE,
  v = NULL,
  vpop = NULL,
  w = NULL,
  wpop = NULL,
  method = "HB",
  beta0 = rep(0, ncol(X)),
  Omega0 = Diagonal(n = ncol(X), x = 0),
  nu0 = 0,
  s20 = 0,
  prior = function(x) rep.int(1L, length(x)),
  CV = prod(dim(X)) < 1e+06,
  CVweights = NULL,
  silent = FALSE,
  keep.data = FALSE,
  full.cov = nrow(Xpop) < 1000L,
  lambda0 = NULL,
  rel.int.tol = 0.01,
  ...
)

Arguments

`y`	response vector of length n.
`X`	n x p model matrix.
`area`	n-vector of area codes, typically a factor variable with m levels, where m is the number of in-sample areas.
`Narea`	M-vector of area population sizes, where M is the number of areas for which estimates are required. There should be a one-to-one correspondence with the rows of `Xpop`. This argument is required unless `Xpop=NULL` or `fpc=FALSE`.
`Xpop`	M x p matrix of population means. If `Xpop` is not provided, only the model fit is returned.
`fpc`	whether a finite population correction should be used. Default is `TRUE`.
`v`	unit-level variance structure, n-vector. Defaults to a vector of 1s. In some cases it might be useful to take v proportional to the sampling probabilities.
`vpop`	population area means of v, M-vector. Defaults to a vector of 1s. Not used when `fpc` is `FALSE`.
`w`	area-level variance structure, m-vector. Defaults to a vector of 1s.
`wpop`	area-level variance structure, M-vector. Defaults to a vector of 1s. Only components of `wpop` corresponding to out-of-sample areas are actually used.
`method`	one of "HB", "hybrid", "REML", "synthetic", "survreg", "BLUP" where "HB" (default) does the full hierarchical Bayes computation, i.e. numerical integration over the posterior density for the between area variance parameter, "hybrid" computes the Best Linear Unbiased Predictor (BLUP) with the posterior mean for the variance parameter plugged in, "REML" computes the BLUP with the restricted maximum likelihood estimate of the variance parameter plugged in, "synthetic" computes synthetic estimates where the between area variance is set to 0, and "survreg" computes survey regression estimates where the between area variance approaches infinity. "BLUP" computes BLUP estimates with the value provided for `lambda0` as a fixed plug-in value for the ratio of between and within area variance. Only method "HB" takes uncertainty about the between-area variance into account.
`beta0`	mean vector of normal prior for coefficient vector.
`Omega0`	inverse covariance matrix of normal prior for coefficient vector. Default prior corresponds to the (improper) uniform distribution.
`nu0`	degrees of freedom parameter for inverse gamma prior for residual (within-area) variance. Default is 0.
`s20`	scale parameter for inverse gamma prior for residual (within-area) variance. Default is 0.
`prior`	prior density for the ratio lambda = between-area-variance / within-area variance. This should be a (vectorized) function that takes a vector lambda and returns a vector of prior density values at lambda. The density does not have to be normalized. The default is the (improper) uniform prior. The within-area variance and lambda are assumed independent a priori.
`CV`	whether (an approximation to the) leave-one-out cross-validation measure should be computed. As this requires the computation of a dense matrix the size of `X`, the default is to set `CV` to `FALSE` if the size of `X` is larger than a certain lower bound.
`CVweights`	n-vector of weights to use for CV computation.
`silent`	if `FALSE`, plot the posterior density for the variance ratio.
`keep.data`	if `TRUE` return the input data (y,X,area,Xpop). This is required input for the cross-validation function `CVArea`.
`full.cov`	if `TRUE` compute the full covariance matrix for the small area estimates. The computed correlations do not account for uncertainty about the variance ratio.
`lambda0`	optional starting value for the ratio of between and within-area variance used in the numerical routines. If `method="BLUP"` then this value will instead be used as a fixed plug-in value.
`rel.int.tol`	tolerance for the estimated relative integration error (default is 1 percent). A warning is issued if the estimated relative error exceeds this value.
`...`	additional control parameters passed to function `integrate`.

Details

The default Hierarchical Bayes method uses numerical integration (as provided by function integrate) to compute small area estimates and MSEs. The model parameters returned, such as fixed and random effects, are currently not averaged over the posterior distribution for the variance ratio. They are evaluated at the posterior mean of the variance ratio.

Value

An object of class sae containing the small area estimates and MSEs, the model fit, and model selection measures.

References

G.E. Battese, R.M. Harter and W.A. Fuller (1988). An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data. Journal of the American Statistical Association, 83(401), 28-36.

G.S. Datta and M. Ghosh (1991). Bayesian Prediction in Linear Models: Applications to Small Area Estimation. The Annals of Statistics 19(4), 1748-1770.

J.N.K. Rao and I. Molina (2015). Small Area Estimation. Wiley.

Examples

d <- generateFakeData()

# generate design matrix, variable of interest, area indicator and population data
dat <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop,
            type="data")

# compute small area estimates based on the basic unit-level model
sae <- fSAE.Unit(dat$y, dat$X, dat$area, dat$Narea, dat$PopMeans)
EST(sae)  # estimates
RMSE(sae)  # standard errors

[Package hbsae version 1.2 Index]