xwalk_ags {ags} | R Documentation |
Crosswalk Municipality or District Statistics
Description
This function constructs time series of counts for Germany's municipalities (Gemeinden) and districts (Kreise).
Usage
xwalk_ags(
data,
ags,
time,
xwalk,
variables = NULL,
strata = NULL,
weight = NULL,
fuzzy_time = FALSE,
verbose = TRUE
)
Arguments
data |
A data frame or a data frame extension (e.g. a tibble). |
ags |
Name of the character variable (quoted) with municipality AGS (Gemeinden, 8 digits) or district AGS (Kreise, 5 digits). |
time |
Name of the variable (quoted) identifying the year (YYYY format). Values will be coerced to integers. |
xwalk |
Name of the crosswalk. The following crosswalks are available:
|
variables |
Either a vector of names (quoted) for
variables to interpolate or |
strata |
Vector of variable names (quoted) or |
weight |
Name of the interpolation weight or
|
fuzzy_time |
If |
verbose |
If |
Details
This function facilitates the use of crosswalks constructed by the BBSR for municipalities and districts in Germany (Milbert 2010). The crosswalks map one year's set of district/municipality identifiers to later year's identifiers and provide weights to perform area or population weighted interpolation.
All data rows with NA
s in either the ags
or time
variable are excluded. The same applies to all rows with a value in
ags
or time
that never appears in the crosswalk.
Fuzzy matching uses the absolute difference between the year reported in the data and a crosswalk year. If there is a tie, crosswalk years from before the year reported in the data are preferred.
If area or population weighted interpolation is requested (i.e., when
variables
are supplied), the combination of the variables set
in ags
, time
and strata
need to uniquely
identify a row in data
.
Caution: Data from https://www.regionalstatistik.de/ sometimes includes
annual values for merged units (e.g., Städteregion Aachen, 05334)) and
for their former parts (Kreis Aachen, 05354 and Stadt Aachen, 05313).
When such data is crosswalked with fuzzy_time=TRUE
and
interpolated, the final counts will be off by approximately factor 2.
The reason is that the final output is the sum of the interpolated counts
for the parts and the measured count of the merged unit.
Value
If interpolation is requested, the crosswalked and interpolated
data are returned. If interpolation is not requested, the data
matched
with the crosswalk are returned. The following variables are added:
-
row_id
row number ofdata
before matching. -
ags[*]
the crosswalked AGS. -
year_xw
the matched year from the crosswalk. -
[*]_conv
the interpolation weight. -
diff
the absolute difference betweenyear_xw
andtime
.
References
Milbert, Antonia. 2010. "Gebietsreformen–politische Entscheidungen und Folgen für die Statistik." BBSR-Berichte kompakt 6/2010. Bundesinsitut für Bau-, Stadt-und Raumfoschung.
Examples
data(btw_sn)
btw_sn_ags20 <- xwalk_ags(
data = btw_sn,
ags = "district",
time = "year",
xwalk = "xd20",
variables = c("voters", "valid"),
weight = "pop"
)
head(btw_sn_ags20)