get_adi {sociome} | R Documentation |
Get Area Deprivation Index (ADI) and Berg Indices (ADI-3)
Description
Returns the ADI and ADI-3 of user-specified areas.
Usage
get_adi(
geography,
state = NULL,
county = NULL,
geoid = NULL,
zcta = NULL,
year,
dataset = c("acs5", "acs3", "acs1", "decennial"),
geometry = FALSE,
keep_indicators = FALSE,
raw_data_only = FALSE,
cache_tables = TRUE,
key = NULL,
seed = NA,
...
)
Arguments
geography |
A character string denoting the level of census geography
whose ADIs and ADI-3s you'd like to obtain. Must be one of |
state |
A character string specifying states whose ADI and ADI-3 data is
desired. Defaults to |
county |
A vector of character strings specifying the counties whose ADI
and ADI-3 data you're requesting. Defaults to |
geoid |
A character vector of GEOIDs (use quotation marks and leading
zeros). Defaults to |
zcta |
A character vector of ZCTAs or the leading digit(s) of ZCTAs (use
quotation marks and leading zeros). Defaults to Strings under 5 digits long will yield all ZCTAs that begin with those digits. Requires that |
year |
Single integer specifying the year of US Census data to use. |
dataset |
The data set used to calculate ADIs and ADI-3s. Must be one of
When The 2010 decennial census did not include the long-form questionnaire used in the 1990 and 2000 censuses, so this function uses the 5-year estimates from the 2010 ACS to supply the data not included in the 2010 decennial census. In fact, the only 2010 decennial variables used are H003002, H014002, P020002, and P020008. Important: data are not always available depending on the level of geography and data set chosen. See https://www.census.gov/programs-surveys/acs/guidance/estimates.html. |
geometry |
Logical value indicating whether or not shapefile data should
be included in the result, making the result an The shapefile data that is returned is somewhat customizable by passing
certain arguments along to the |
keep_indicators |
Logical value indicating whether or not the resulting
See |
raw_data_only |
Logical, indicating whether or not to skip calculation
of the ADI and ADI-3 and only return the census variables. Defaults to
|
cache_tables |
The plural version of the |
key |
Your Census API key as a character string. Obtain one at
http://api.census.gov/data/key_signup.html. Defaults to |
seed |
Passed to |
... |
Additional arguments to be passed onto This may be found to be helpful when setting |
Details
Returns a tibble
or sf
object of the Area
Deprivation Indices (ADIs) and Berg Indices (ADI-3s) of user-specified
locations in the United States, utilizing US Census data. Locations that are
listed as having zero households are excluded from ADI and ADI-3 calculation:
their ADI and ADI-3 values will be NA
.
Value
If geometry = FALSE
, (the default) a tibble
. If
geometry = TRUE
is specified, an sf
.
Reference area
The concept of "reference area" is important to understand when using this function. The algorithm that produced the original ADIs employs factor analysis. As a result, the ADI is a relative measure; the ADI of a particular location is dynamic, varying depending on which other locations were supplied to the algorithm. In other words, ADI will vary depending on the reference area you specify.
For example, the ADI of Orange County, California is x when calculated
alongside all other counties in California, but it is y when calculated
alongside all counties in the US. The get_adi()
function enables the user
to define a reference area by feeding a vector of GEOIDs to its geoid
parameter (or alternatively for convenience, states and/or counties to
state
and county
). The function then gathers data from those specified
locations and performs calculations using their data alone.
The Berg Indices (ADI-3) were developed with this principle of relativity in mind, and as such there is no set of seminal ADI-3 values. Thus, the terms "Berg Indices" and "ADI-3" refer more nearly to any values generated using the algorithm employed in this package.
Areas listed as having zero households are excluded from the reference
area, and their ADI and ADI-3 values will be NA
.
The geoid
parameter
Elements of geoid
can represent different
levels of geography, but they all must be either 2 digits (for states), 5
digits (for counties), 11 digits (for tracts), or 12 digits (for block
groups). It must contain character strings, so use quotation marks as well
as leading zeros where applicable.
ADI and ADI-3 factor loadings
The returned
tibble
or sf
is of class adi
, and it
contains an attribute called loadings
, which contains a tibble of the PCA
loadings of each factor. This is accessible through
attr
(name_of_tibble, "loadings")
.
Missingness and imputation
While this function allows flexibility in specifying reference areas (see the Reference area section above), data from the US Census are masked for sparsely populated places, resulting in many missing values.
Imputation is attempted via mice::mice
(m = 1, maxit = 50, method = "pmm", seed = seed)
. If imputation is unsuccessful, an error is thrown,
but the dataset of indicators on which imputation was unsuccessful is
available via rlang::last_error()
$adi_indicators
and the raw census
data are available via rlang::last_error()
$adi_raw_data
. The former
excludes areas with zero households, but the latter includes them.
One of the indicators of both ADI and the Financial Strength component of
ADI-3 is median family income, but methodological issues with the 2015 and
2016 ACS have rendered this variable unavailable at the block group level
for those years. When requested, this function will use median household
income in its place, with a warning()
. See
https://www.census.gov/programs-surveys/acs/technical-documentation/user-notes/2016-01.html.
API-related error handling
Depending on user input, this function
may call its underlying functions (tidycensus::get_acs()
or
tidycensus::get_decennial()
) many times in order to accommodate their
behavior. When these calls are broken up by state or by state and county, a
message is printed indicating the state or state and county whose data is
being pulled. These calls are wrapped in
purrr::insistently
(
purrr::rate_delay()
, quiet = FALSE)
, meaning
that they are attempted over and over until success, and tidycensus
error
messages are printed as they occur.
Warnings and disclaimers
Please note that this function calls data from US Census servers, so execution may take a long time depending on the user's internet connection and the amount of data requested.
For advanced users, if changing the dataset
argument, be sure to know the
advantages and limitations of the 1-year and 3-year ACS estimates. See
https://www.census.gov/programs-surveys/acs/guidance/estimates.html for
details.
Examples
## Not run:
# Wrapped in \dontrun{} because all these examples take >5 seconds
# and require a Census API key.
# ADI of all census tracts in Cuyahoga County, Ohio
get_adi(geography = "tract", year = 2017, state = "OH", county = "Cuyahoga")
# ADI and ADI-3 of all counties in Connecticut, using the 2014 ACS1 survey.
# Returns a warning because there are only 8 counties.
# A minimum of 30 locations is recommended.
get_adi(geography = "county", state = "CT", year = 2014, dataset = "acs1")
# Areas with zero households will have an ADI and ADI-3 of NA:
queens <-
get_adi(
"tract",
year = 2017,
state = "NY",
county = "Queens",
keep_indicators = TRUE,
geometry = TRUE
)
queens %>%
dplyr::as_tibble() %>%
dplyr::select(GEOID, NAME, ADI, households = B11005_001) %>%
dplyr::filter(is.na(ADI) | households == 0) %>%
print(n = Inf)
# geoid argument allows for highly customized reference populations.
# ADI of all census tracts in the GEOIDs stored in "delmarva" below:
# Notice the mixing of state- ("10") and county-level GEOIDs (the others).
delmarva_geoids <- c("10", "51001", "51131", "24015", "24029", "24035",
"24011", "24041", "24019", "24045", "24039", "24047")
delmarva <-
get_adi(
geography = "tract",
geoid = delmarva_geoids,
dataset = "acs5",
year = 2009,
geometry = TRUE
)
# Demonstration of geom_sf() integration:
require(ggplot2)
# The na.value argument changes the fill of NA ADI areas.
delmarva %>% ggplot() + geom_sf(aes(fill = ADI), lwd = 0)
# Setting direction = -1 makes the less deprived areas the lighter ones
# The argument na.value changes the color of zero-household areas
queens %>%
ggplot() +
geom_sf(aes(fill = ADI), lwd = 0) +
scale_fill_viridis_c(na.value = "red", direction = -1)
# Obtain factor loadings:
attr(queens, "loadings")
## End(Not run)