R: Impact calculations

impacts {MLID}

R Documentation

Impact calculations

Description

Calculates the total contribution to the index of dissimilarity of neighbourhoods grouped by regions or other higher-level geographies

Usage

impacts(data, vars, levels, omit = NULL)

Arguments

`data`	a data frame with `ncol(data) >= 2`. Each row of the data represents a neighbourhood or some other areal unit for which counts of population have been made.
`vars`	a character or numeric vector of length 2 or 3 giving either the names or columns positions of the variables in `data` in the following order the number of population group Y in each neighbourhood the number of population group X in each neighbourhood
`levels`	a character or numeric vector of minimum length 1 identifying either the names or columns positions of the variables in `data` that record to which higher-level grouping each lower-lower level unit belongs
`omit`	(optional) a character vector containing the names of places to search for in the data and to omit from the calculations

Details

When the index of dissimilarity (ID) is estimated as a regression model the residuals from that model are the differences between the share of population group Y and the share of population group X that are observed in each neighbourhood. The impacts function summaries those differences by higher-level geographies to consider which places or regions have the neighbourhoods that contribute most to the ID. The measures are useful for understanding where the seperations of the two population groups are greatest. However, to look at scale effects, where the effect of each level net of the other levels is wanted, fit a multilevel index using function id.

Value

A list of data.frames, each containing the impact calculations for the higher-level geographies. The variables are

pcntID The total contribution of the neighbourhoods within the region to the overall ID score, expressed as a percentage
pcntN The number of neighbourhoods within the region, expressed as a percentage of the total number in data
impact The ratio of pcntID to pcntN multiplied by 100. Values over 100 indicate a group of neighbourhoods that have a disproportionately high impact on the ID
scldMean The average difference between the share of the Y population and the share of the X population, scaled by the standard error of the differences for the whole data set (to give a z-value). Positive values mean that, on average, the region has a greater share of the Y population than the X. Negative values mean it has less.
scldSD A measure of how much the differences between the shares of the two populations vary within the region. It is the standard deviation of those differences scaled by the standard error for the whole data set. Higher values indicate greater variability within the region.
scldMin The minimum difference between the share of the Y population and the share of the X for neighbourhoods within the region, scaled by the standard error
scldMax The maximum difference between the share of the Y population and the share of the X for neighbourhoods within the region, scaled by the standard error
pNYgtrNX The percentage of neighbourhoods within the region where the count of population group Y (as opposed to the share) is greater than the count of population group X

Examples

data(aggdata)
impx <- impacts(aggdata, c("Bangladeshi", "WhiteBrit"), c("LAD","RGN"))
head(impx)
# sorted by impact score
# For $RGN London has the greatest impact on the ID
# The 'excess' share of the Bangladeshi population is not especially
# significant (see scldMean) but there is a lot of variation between
# neighbourhoods (see scldSD)
# For $LAD note the impacts of Tower Hamlets and Newham

[Package MLID version 1.0.1 Index]