fh {povmap} | R Documentation |
Standard and Extended Fay-Herriot Models for Disaggregated Indicators
Description
Function fh
estimates indicators using the Fay-Herriot approach by
Fay and Herriot (1979). Empirical best linear unbiased predictors
(EBLUPs) and mean squared error (MSE) estimates are provided. Additionally,
different extensions of the standard Fay-Herriot model are available:
Adjusted estimation methods for the variance of the random effects (see
Li and Lahiri (2010) and Yoshimori and Lahiri (2014)) are
offered. Log and arcsin transformation for the dependent variable and two
types of backtransformation can be chosen - a crude version and the one
introduced by Slud and Maiti (2006) for log transformed variables
and a naive and bias-corrected version following Hadam et al. (2020)
for arcsin transformed variables. A spatial extension to the Fay-Herriot
model following Petrucci and Salvati (2006) is also included. In
addition, it is possible to estimate a robust version of the standard and of
the spatial model (see Warnholz (2016)). Finally, a Fay-Herriot model
can be estimated when the auxiliary information is measured with error
following Ybarra and Lohr (2008).
Usage
fh(
fixed,
vardir,
combined_data,
domains = NULL,
method = "reml",
interval = NULL,
k = 1.345,
mult_constant = 1,
transformation = "no",
backtransformation = NULL,
eff_smpsize = NULL,
correlation = "no",
corMatrix = NULL,
Ci = NULL,
tol = 1e-04,
maxit = 100,
MSE = FALSE,
mse_type = "analytical",
B = c(50, 0),
seed = 123
)
Arguments
fixed |
a two-sided linear formula object describing the fixed-effects part of the linear mixed regression model with the dependent variable on the left of a ~ operator and the explanatory variables on the right, separated by + operators. |
vardir |
a character string indicating the name of the variable
containing the domain-specific sampling variances of the direct estimates
that are included in |
combined_data |
a data set containing all the input variables that are needed for the estimation of the Fay-Herriot model: the direct estimates, the sampling variances, the explanatory variables and the domains. In addition, the effective sample size needs to be included, if the arcsin transformation is chosen. |
domains |
a character string indicating the domain variable that is
included in |
method |
a character string describing the method for the estimation of
the variance of the random effects. Methods that can be chosen
(i) restricted maximum likelihood (REML) method (" |
interval |
optional argument, if method " |
k |
numeric tuning constant. Required argument when the robust version
of the standard or spatial Fay-Herriot model is chosen. Defaults to
|
mult_constant |
numeric multiplier constant used in the bias corrected
version of the robust estimation methods. Required argument when the robust
version of the standard or spatial Fay-Herriot model is chosen. Default is to
make no correction for realizations of direct estimator within
|
transformation |
a character that determines the type of transformation
of the dependent variable and of the sampling variances. Methods that can be
chosen (i) no transformation (" |
backtransformation |
a character that determines the type of
backtransformation of the EBLUPs and MSE estimates. Required argument when a
transformation is chosen. Available methods are (i) crude bias-correction
following Rao (2015) when the log transformation is chosen
(" |
eff_smpsize |
a character string indicating the name of the variable
containing the effective sample sizes that are included in
|
correlation |
a character determining the correlation structure of the
random effects. Possible correlations are
(i) no correlation (" |
corMatrix |
matrix or data frame with dimensions number of areas times
number of areas containing the row-standardized proximities between the
domains. Values must lie between |
Ci |
array with dimension number of estimated regression coefficients
times number of estimated regression coefficients times number of areas
containing the variance-covariance matrix of the explanatory variables for
each area. For an example of how to create the array, please refer to the
vignette. Required argument within the Ybarra-Lohr model
( |
tol |
a number determining the tolerance value for the estimation of the
variance of the random effects. Required argument when method " |
maxit |
a number determining the maximum number of iterations for the
estimation of the variance of the random effects. Required argument when
method " |
MSE |
if |
mse_type |
a character string determining the estimation method of the
MSE. Methods that can be chosen
(i) analytical MSE depending on the estimation method of the variance of the
random effect (" |
B |
either a single number or a numeric vector with two elements. The
single number or the first element defines the number of bootstrap iterations
when a bootstrap MSE estimator is chosen. When the standard FH
model is applied and the information criteria by Marhuenda et al. (2014)
should be computed, the second element of |
seed |
an integer to set the seed for the random number generator. For
the usage of random number generation see details. If seed is set to
|
Details
In the bootstrap approaches, random number generation is used. Thus,
a seed is set by the argument seed
.
Out-of-sample EBLUPs are available for all area-level models except for the
bc_sm
backtransformation and for the robust models.
Out-of-sample MSEs are available for the analytical MSE estimator of the
standard Fay-Herriot model with reml and ml variance estimation, the crude
backtransformation in case of log transformation and the bootstrap MSE
estimator for the arcsin transformation.
For a description of how to create the proximity matrix for the
spatial Fay-Herriot model, see the package vignette. If the presence
of out-of-sample domains, the proximity matrix needs to be
subsetted to the in-sample domains.
Value
An object of class "fh", "emdi" that provides estimators
for regional disaggregated indicators like means and ratios and optionally
corresponding MSE estimates. Several generic functions have methods for the
returned object. For a full list and descriptions of the components of
objects of class "emdi", see emdiObject
.
References
Chen S., Lahiri P. (2002), A weighted jackknife MSPE estimator in small-area
estimation, "Proceeding of the Section on Survey Research Methods", American
Statistical Association, 473 - 477.
Datta, G. S. and Lahiri, P. (2000), A unified measure of uncertainty of
Statistica Sinica 10(2), 613-627.
Fay, R. E. and Herriot, R. A. (1979), Estimates of income for small places:
An application of James-Stein procedures to census data, Journal of the
American Statistical Association 74(366), 269-277.
González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D. and
Santamaría, L. (2008) Analytic and bootstrap approximations of prediction
errors under a multivariate Fay-Herriot model. Computational Statistics &
Data Analysis, 52, 5242–5252.
Hadam, S., Wuerz, N. and Kreutzmann, A.-K. (2020), Estimating
regional unemployment with mobile network data for Functional Urban Areas in
Germany, Refubium - Freie Universitaet Berlin Repository, 1-28.
Jiang, J., Lahiri, P., Wan, S.-M. and Wu, C.-H. (2001), Jackknifing in the
Fay–Herriot model with an example. In Proc. Sem. Funding Opportunity in
Survey Research, Washington DC: Bureau of Labor Statistics, 75–97.
Jiang, J., Lahiri, P.,Wan, S.-M. (2002), A unified jackknife theory for
empirical best prediction with M-estimation, Ann. Statist., 30,
1782-810.
Li, H. and Lahiri, P. (2010), An adjusted maximum likelihood method for
solving small area estimation problems, Journal of Multivariate Analyis 101,
882-902.
Marhuenda, Y., Morales, D. and Pardo, M.C. (2014). Information criteria for
Fay-Herriot model selection. Computational Statistics and Data Analysis 70,
268-280.
Neves, A., Silva, D. and Correa, S. (2013), Small domain estimation for the
Brazilian service sector survey, ESTADISTICA 65(185), 13-37.
Prasad, N. and Rao, J. (1990), The estimation of the mean squared error of
small-area estimation, Journal of the American Statistical
Association 85(409), 163-171.
Petrucci, A., Salvati, N. (2006), Small Area Estimation for Spatial
Correlation in Watershed Erosion Assessment, Journal of Agricultural,
Biological and Environmental Statistics, 11(2), 169–182.
Rao, J. N. K. (2003), Small Area Estimation, New York: Wiley.
Rao, J. N. K. and Molina, I. (2015), Small area estimation,
New York: Wiley.
Slud, E. and Maiti, T. (2006), Mean-squared error estimation in transformed
Fay-Herriot models, Journal of the Royal Statistical Society:Series B 68(2),
239-257.
Warnholz, S. (2016), saeRobust: Robust small area estimation.
R package.
Warnholz, S. (2016b). Small area estimation using robust extensions to area
level models. Ph.D. thesis, Freie Universitaet Berlin.
Ybarra, L. and Lohr, S. (2008), Small area estimation when auxiliary
information is measured with error, Biometrika, 95(4), 919-931.
Yoshimori, M. and Lahiri, P. (2014), A new adjusted maximum likelihood method
for the Fay-Herriot small area model, Journal of Multivariate Analysis 124,
281-294.
Examples
# Loading data - population and sample data
data("eusilcA_popAgg")
data("eusilcA_smpAgg")
# Combine sample and population data
combined_data <- combine_data(
pop_data = eusilcA_popAgg,
pop_domains = "Domain",
smp_data = eusilcA_smpAgg,
smp_domains = "Domain"
)
# Example 1: Standard Fay-Herriot model and analytical MSE
fh_std <- fh(
fixed = Mean ~ cash + self_empl, vardir = "Var_Mean",
combined_data = combined_data, domains = "Domain", method = "ml",
MSE = TRUE
)
# Example 2: arcsin transformation of the dependent variable
fh_arcsin <- fh(
fixed = MTMED ~ cash + age_ben + rent + house_allow,
vardir = "Var_MTMED", combined_data = combined_data, domains = "Domain",
method = "ml", transformation = "arcsin", backtransformation = "bc",
eff_smpsize = "n", MSE = TRUE, mse_type = "boot", B = c(50, 0)
)
# Example 3: Spatial Fay-Herriot model
# Load proximity matrix
data("eusilcA_prox")
fh_spatial <- fh(
fixed = Mean ~ cash + self_empl, vardir = "Var_Mean",
combined_data = combined_data, domains = "Domain", method = "reml",
correlation = "spatial", corMatrix = eusilcA_prox, MSE = TRUE,
mse_type = "analytical"
)
# Example 4: Robust Fay-Herriot model
fh_robust <- fh(
fixed = Mean ~ cash + self_empl, vardir = "Var_Mean",
combined_data = combined_data, domains = "Domain", method = "reblupbc",
k = 1.345, mult_constant = 1, MSE = TRUE, mse_type = "pseudo"
)
# Example 5: Ybarra-Lohr model
# Create MSE array
P <- 1
M <- length(eusilcA_smpAgg$Mean)
Ci_array <- array(data = 0, dim = c(P + 1, P + 1, M))
for (i in 1:M) {
Ci_array[2, 2, i] <- eusilcA_smpAgg$Var_Cash[i]
}
fh_yl <- fh(
fixed = Mean ~ Cash, vardir = "Var_Mean",
combined_data = eusilcA_smpAgg, domains = "Domain", method = "me",
Ci = Ci_array, MSE = TRUE, mse_type = "jackknife"
)