R: Fitting a Small Area Model

fit_sae {tipsae}

R Documentation

Fitting a Small Area Model

Description

fit_sae() is used to fit Beta-based small area models, such as the classical Beta, zero and/or one inflated Beta and Flexible Beta models. The random effect part can incorporate either a temporal and/or a spatial dependency structure devoted to the prior specification settings. In addition, different prior assumptions can be specified for the unstructured random effects, allowing for robust and shrinking priors and different parametrizations can be set up.

Usage

fit_sae(
  formula_fixed,
  data,
  domains = NULL,
  disp_direct,
  type_disp = c("neff", "var"),
  domain_size = NULL,
  household_size = NULL,
  likelihood = c("beta", "flexbeta", "Infbeta0", "Infbeta1", "Infbeta01", "ExtBeta"),
  prior_coeff = c("normal", "HorseShoe"),
  p0_HorseShoe = NULL,
  prior_reff = c("normal", "t", "VG"),
  spatial_error = FALSE,
  spatial_df = NULL,
  domains_spatial_df = NULL,
  temporal_error = FALSE,
  temporal_variable = NULL,
  scale_prior = list(Unstructured = 2.5, Spatial = 2.5, Temporal = 2.5, Coeff. = 2.5),
  adapt_delta = 0.95,
  max_treedepth = 10,
  init = "0",
  ...
)

Arguments

`formula_fixed`	An object of class `"formula"` specifying the linear regression fixed part at the linking level.
`data`	An object of class `"data.frame"` containing all relevant quantities.
`domains`	Data column name displaying the domain names. If `NULL` (default), the domains are denoted with a progressive number.
`disp_direct`	Data column name displaying given values of sampling dispersion for each domain. In out-of-sample areas, dispersion must be `NA`.
`type_disp`	Parametrization of the dispersion parameter. The choices are variance (`"var"`) or `\phi_d` + 1 (`"neff"`) parameter.
`domain_size`	Data column name indicating domain sizes (optional). In out-of-sample areas, sizes must be `NA`.
`household_size`	Data column name indicating the number of sample household. Required if `ExtBeta` model is required.
`likelihood`	Sampling likelihood to be used. The choices are `"beta"` (default), `"flexbeta"`, `ExtBeta`, `"Infbeta0"`, `"Infbeta1"` and `"Infbeta01"`.
`prior_coeff`	Prior distribution of the regression coefficients. The choices are `⁠"normal⁠` or `HorseShoe`.
`p0_HorseShoe`	If `prior_coeff = "HorseShoe"`, it requires the expected number of relevant covariates.
`prior_reff`	Prior distribution of the unstructured random effect. The choices are: `"normal"`, `"t"`, `"VG"`.
`spatial_error`	Logical indicating whether to include a spatially structured random effect.
`spatial_df`	Object of class `SpatialPolygonsDataFrame` or `sf` with the shapefile of the studied region. Required if `spatial_error = TRUE`.
`domains_spatial_df`	Column name of the `spatial_df` object displaying the domain names. Required if `spatial_error = TRUE`.
`temporal_error`	Logical indicating whether to include a temporally structured random effect.
`temporal_variable`	Data column name indicating temporal variable. Required if `temporal_error = TRUE`.
`scale_prior`	List with the values of the prior scales. 4 named elements must be provided: "Unstructured", "Spatial", "Temporal", "Coeff.". Default: all equal to 2.5.
`adapt_delta`	HMC option: target average proposal acceptance probability. See `stan` documentation.
`max_treedepth`	HMC option: target average proposal acceptance probability. See `stan` documentation.
`init`	Initial values specification. See the detailed documentation for the init argument in `stan`.
`...`	Arguments passed to `sampling` (e.g. iter, chains).

Value

A list of class fitsae containing the following objects:

model_settings: A list summarizing all the assumptions of the model: sampling likelihood, presence of intercept, dispersion parametrization, random effects priors and possible structures.
data_obj: A list containing input objects including in-sample and out-of-sample relevant quantities.
stanfit: A stanfit object, outcome of sampling function containing full posterior draws. For details, see stan documentation.
pars_interest: A vector containing the names of parameters whose posterior samples are stored.
call: Image of the function call that produced the fitsae object.

References

Janicki R (2020). “Properties of the beta regression model for small area estimation of proportions and application to estimation of poverty rates.” Communications in Statistics-Theory and Methods, 49(9), 2264–2284.

Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017). “Stan: A probabilistic programming language.” Journal of Statistical Software, 76(1), 1–32.

Morris M, Wheeler-Martin K, Simpson D, Mooney SJ, Gelman A, DiMaggio C (2019). “Bayesian hierarchical spatial models: Implementing the Besag York Mollié model in stan.” Spatial and Spatio-Temporal Epidemiology, 31, 100301.

De Nicolò S, Ferrante MR, Pacei S (2023). “Small area estimation of inequality measures using mixtures of Beta.” https://doi.org/10.1093/jrsssa/qnad083.

De Nicolò S, Gardini A (2024). “The R Package tipsae: Tools for Mapping Proportions and Indicators on the Unit Interval.” Journal of Statistical Software, 108(1), 1–36. doi:10.18637/jss.v108.i01.

Examples

library(tipsae)

# loading toy cross sectional dataset
data("emilia_cs")

# fitting a cross sectional model
fit_beta <- fit_sae(formula_fixed = hcr ~ x, data = emilia_cs, domains = "id",
                    type_disp = "var", disp_direct = "vars", domain_size = "n",
                    # MCMC setting to obtain a fast example. Remove next line for reliable results.
                    chains = 1, iter = 150, seed = 0)


# Spatio-temporal model: it might require time to be fitted
## Not run: 
# loading toy panel dataset
data("emilia")
# loading the shapefile of the concerned areas
data("emilia_shp")

# fitting a spatio-temporal model
fit_ST <- fit_sae(formula_fixed = hcr ~ x,
                  domains = "id",
                  disp_direct = "vars",
                  type_disp = "var",
                  domain_size = "n",
                  data = emilia,
                  spatial_error = TRUE,
                  spatial_df = emilia_shp,
                  domains_spatial_df = "NAME_DISTRICT",
                  temporal_error = TRUE,
                  temporal_variable = "year",
                  max_treedepth = 15,
                  seed = 0)

## End(Not run)

[Package tipsae version 1.0.2 Index]