R: (Multilevel) index of dissimilarity

id {MLID}

R Documentation

(Multilevel) index of dissimilarity

Description

Returns either the standard index of dissimilarity (ID) or its multilevel equivalent

Usage

id(data, vars, levels = NA, expected = FALSE, nsims = 100, omit = NULL)

Arguments

`data`	a data frame with `ncol(data) >= 2`. Each row of the data represents a neighbourhood or some other areal unit for which counts of population have been made.
`vars`	a character or numeric vector of length 2 or 3 giving either the names or columns positions of the variables in `data` in the following order: the number of population group Y in each neighbourhood the number of population group X in each neighbourhood (optional) The total population in each neighbourhood
`levels`	a character or numeric vector of minimum length 1 identifying either the names or columns positions of the variables in `data` that record to which higher-level grouping each lower-lower level unit belongs. If `levels = NA`, the default, then only the standard index of dissimilarity is calculated.
`expected`	a logical scaler. Should the expected value of the ID under randomisation be calculated? Requires a measure of the total population in each neighbourhood. If omitted from `vars` that total will be calculated as `sum(X + Y)`.
`nsims`	a vector, the number of random draws to be used for calculating the expected value. Default is 100.
`omit`	(optional) a character vector containing the names of places to search for in the data and to omit from the calculations

Details

If Y is the number of population group Y living in each neighbourhood and X is the number of population group X then id measures how unevenly distributed are the two groups relative to one another and is a measure of segregation. In addition, for geographically hierarchichal data, scale effects may be explored to examine the scale of geographical clustering.

The method works by treating the calculation of the ID as a regression problem: if Y is recalculated as the share per neighbourhood of the total count of population group Y (i.e. Y <- Y / sum(Y)) and X is recalculated in the same way for X, then fitting ols <- lm(Y ~ 0, offset = X) generates a set of residuals, e <- residuals(ols) where each residual is the difference in the share of Y and the share of X per neighbourhood, and the sum of the absolute of those residuals can be used to obtain the id: id <- 0.5 * sum(abs(e)).

The advantage of calculating the ID in this way is that it can be extended to consider geographic hierarchies, where neighbourhoods at the base level can be grouped into larger regions at the next level, and so forth. Then, for the multilevel index, the residuals are estimated at and partitioned between each level of the model net of the other levels, allowing scale effects to be explored.

print(index) displays the ID value, the expected value of the ID under randomisation (NA if not calculated), and, for a multilevel model, the percentage share of the total variance due to each level (a measure of the geographical scale of segregation: see the examples given by checkerboard) and the holdback scores - see holdback

Value

an object of class index. This is a value between zero and one where 0 implies no segreation, and 1 means 'complete segregation' - wherever group Y is located, X is not (and vice versa). If expected = TRUE the expected value under randomisation also is given. In addition, the object contains the following attributes:

attr(x, "ols") an object of class lm. The OLS regression used to calculate the ID. Useful for identifying significant residuals (see Example below)
attr(x, "vars") the names of Y and X in data
attr(x, "data") a data frame with the population counts for Y and X

and also, for a multilevel model,

attr(index, "mlm") an object of class lmerMod. Fitted using lmer
attr(index, "variance") the percentage of the total variance due to each level of the model. This indicates the scale at which the segregation is most prominent
attr(index, "holdback") records the percentage change in the ID that occurs if, at each level, its contribution to the ID net of other levels is heldback (set to zero)

Examples

data(ethnicities)
head(ethnicities)
# Calculate the standard index value
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit"))

## Not run: 
# Calculate also the expected value under randomisation
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit"), expected = TRUE)
# will generate a warning because the total population per neighbourhood
# has not been specified
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit", "Persons"),
expected = TRUE)
# The expected value is a high percentage of the actual value so
# aggregate it into a higher level geography...
aggdata <- sumup(ethnicities, sumby = "LSOA", drop = "OA")
head(aggdata)

# Multilevel models
id(aggdata, vars = c("Bangladeshi", "WhiteBrit"),
levels = c("MSOA","LAD","RGN"))
id(aggdata, vars = c("Bangladeshi", "WhiteBrit"),
levels = c("MSOA","LAD","RGN"), omit = c("Tower Hamlets", "Newham"))

## End(Not run)

[Package MLID version 1.0.1 Index]