id {MLID} | R Documentation |
(Multilevel) index of dissimilarity
Description
Returns either the standard index of dissimilarity (ID) or its multilevel equivalent
Usage
id(data, vars, levels = NA, expected = FALSE, nsims = 100, omit = NULL)
Arguments
data |
a data frame with |
vars |
a character or numeric vector of length 2 or 3 giving either the
names or columns positions of the variables in
|
levels |
a character or numeric vector of minimum length 1 identifying
either the names or columns positions of the variables in |
expected |
a logical scaler. Should the expected value of the ID under
randomisation be calculated? Requires a measure of the total population in
each neighbourhood. If omitted from |
nsims |
a vector, the number of random draws to be used for calculating the expected value. Default is 100. |
omit |
(optional) a character vector containing the names of places to search for in the data and to omit from the calculations |
Details
If Y
is the number of population group Y living in each neighbourhood
and X
is the number of population group X then id
measures how
unevenly distributed are the two groups relative to one another and is a
measure of segregation. In addition, for geographically hierarchichal data,
scale effects may be explored to examine the scale of geographical
clustering.
The method works by treating the calculation of the ID as a
regression problem: if Y
is recalculated as the share per
neighbourhood of the total count of population group Y
(i.e. Y <- Y / sum(Y)
) and X
is recalculated in the same way
for X, then fitting ols <- lm(Y ~ 0, offset = X)
generates a set of
residuals, e <- residuals(ols)
where each residual is the difference
in the share of Y and the share of X per neighbourhood, and the sum of the
absolute of those residuals can be used to obtain the id:
id <- 0.5 * sum(abs(e))
.
The advantage of calculating the ID in this way is that it can be extended to consider geographic hierarchies, where neighbourhoods at the base level can be grouped into larger regions at the next level, and so forth. Then, for the multilevel index, the residuals are estimated at and partitioned between each level of the model net of the other levels, allowing scale effects to be explored.
print(index)
displays the ID value, the expected value of
the ID under randomisation (NA if not calculated), and, for a multilevel
model, the percentage share of the total variance due to each level
(a measure of the geographical scale of segregation: see the examples given
by checkerboard
) and the holdback scores -
see holdback
Value
an object of class index
. This is a value between zero and one
where 0 implies no segreation, and 1 means 'complete segregation' - wherever
group Y is located, X is not (and vice versa). If expected = TRUE
the
expected value under randomisation also is given. In addition, the object
contains the following attributes:
-
attr(x, "ols")
an object of classlm
. The OLS regression used to calculate the ID. Useful for identifying significant residuals (see Example below) -
attr(x, "vars")
the names of Y and X indata
-
attr(x, "data")
a data frame with the population counts for Y and X
and also, for a multilevel model,
-
attr(index, "mlm")
an object of classlmerMod
. Fitted usinglmer
-
attr(index, "variance")
the percentage of the total variance due to each level of the model. This indicates the scale at which the segregation is most prominent -
attr(index, "holdback")
records the percentage change in the ID that occurs if, at each level, its contribution to the ID net of other levels is heldback (set to zero)
See Also
checkerboard
print.index
holdback
residuals.index
lmer
sumup
Harris R (2017) Fitting a multilevel index of segregation in R: using the MLID package http://rpubs.com/profrichharris/MLID
Harris R (2017) Measuring the scales of segregation: Looking at the residential separation of White British and other school children in England using a multilevel index of dissimilarity http://bit.ly/2lQ4r0n
Examples
data(ethnicities)
head(ethnicities)
# Calculate the standard index value
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit"))
## Not run:
# Calculate also the expected value under randomisation
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit"), expected = TRUE)
# will generate a warning because the total population per neighbourhood
# has not been specified
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit", "Persons"),
expected = TRUE)
# The expected value is a high percentage of the actual value so
# aggregate it into a higher level geography...
aggdata <- sumup(ethnicities, sumby = "LSOA", drop = "OA")
head(aggdata)
# Multilevel models
id(aggdata, vars = c("Bangladeshi", "WhiteBrit"),
levels = c("MSOA","LAD","RGN"))
id(aggdata, vars = c("Bangladeshi", "WhiteBrit"),
levels = c("MSOA","LAD","RGN"), omit = c("Tower Hamlets", "Newham"))
## End(Not run)