R: Theil T Index

theil_t {accessibility}

R Documentation

Theil T Index

Description

Calculates the Theil T Index of a given accessibility distribution. Values range from 0 (when all individuals have exactly the same accessibility levels) to the natural log of n, in which n is the number of individuals in the accessibility dataset. If the individuals can be classified into mutually exclusive and completely exhaustive groups, the index can be decomposed into a between-groups inequaliy component and a within-groups component.

Usage

theil_t(
  accessibility_data,
  sociodemographic_data,
  opportunity,
  population,
  socioeconomic_groups = NULL,
  group_by = character(0)
)

Arguments

`accessibility_data`	A data frame. The accessibility levels whose inequality should be calculated. Must contain the columns `id` and any others specified in `opportunity`.
`sociodemographic_data`	A data frame. The distribution of sociodemographic characteristics of the population in the study area cells. Must contain the columns `id` and any others specified in `population` and `socioeconomic_groups`.
`opportunity`	A string. The name of the column in `accessibility_data` with the accessibility levels to be considerend when calculating inequality levels.
`population`	A string. The name of the column in `sociodemographic_data` with the number of people in each cell. Used to weigh accessibility levels when calculating inequality.
`socioeconomic_groups`	A string. The name of the column in `sociodemographic_data` whose values identify the socioeconomic groups that should be used to calculate the between- and within-groups inequality levels. If `NULL` (the default), between- and within-groups components are not calculated and only the total aggregate inequality is returned.
`group_by`	A `character` vector. When not `character(0)` (the default), indicates the `accessibility_data` columns that should be used to group the inequality estimates by. For example, if `accessibility_data` includes a `scenario` column that identifies distinct scenarios that each accessibility estimates refer to (e.g. before and after a transport policy intervention), passing `"scenario"` to this parameter results in inequality estimates grouped by scenario.

Value

If socioeconomic_groups is NULL, a data frame containing the total Theil T estimates for the study area. If not, a list containing three dataframes: one summarizing the total inequality and the between- and within-groups components, one listing the contribution of each group to the between-groups component and another listing the contribution of each group to the within-groups component.

Examples


data_dir <- system.file("extdata", package = "accessibility")
travel_matrix <- readRDS(file.path(data_dir, "travel_matrix.rds"))
land_use_data <- readRDS(file.path(data_dir, "land_use_data.rds"))

access <- cumulative_cutoff(
  travel_matrix,
  land_use_data,
  cutoff = 30,
  opportunity = "jobs",
  travel_cost = "travel_time"
)

ti <- theil_t(
  access,
  sociodemographic_data = land_use_data,
  opportunity = "jobs",
  population = "population"
)
ti

# to calculate inequality between and within income deciles, we pass
# "income_decile" to socioeconomic_groups.
# some cells, however, are classified as in the decile NA because their
# income per capita is NaN, as they don't have any population. we filter
# these cells from our accessibility data, otherwise the output would include
# NA values (note that subsetting the data like this doesn't affect the
# assumption that groups are completely exhaustive, because cells with NA
# income decile don't have any population)

na_decile_ids <- land_use_data[is.na(land_use_data$income_decile), ]$id
access <- access[! access$id %in% na_decile_ids, ]
sociodem_data <- land_use_data[! land_use_data$id %in% na_decile_ids, ]

ti <- theil_t(
  access,
  sociodemographic_data = sociodem_data,
  opportunity = "jobs",
  population = "population",
  socioeconomic_groups = "income_decile"
)
ti