R: Communities and Crime Data Set

communities.and.crime {fairml}

R Documentation

Communities and Crime Data Set

Description

Combined socio-economic data from the 1990 Census, law enforcement data from the 1990 LEMAS survey, and crime data from the 1995 FBI UCR for various communities in the United States.

Usage

data(communities.and.crime)

Format

The data contains 1969 observations and 104 variables. See the UCI Machine Learning Repository for details.

Note

The data set has been pre-processed as in Komiyama et al. (2018), with the following exceptions:

the variable community has been dropped, as it is non-predictive and contains a sizeable number of missing values;
the variables LemasSwornFT, LemasSwFTPerPop, LemasSwFTFieldOps, LemasSwFTFieldPerPop, LemasTotalReq, LemasTotReqPerPop, PolicReqPerOffic, PolicPerPop, RacialMatchCommPol, PctPolicWhite, PctPolicBlack, PctPolicHisp, PctPolicAsian, PctPolicMinor, OfficAssgnDrugUnits, NumKindsDrugsSeiz, PolicAveOTWorked, PolicCars, PolicOperBudg, LemasPctPolicOnPatr, LemasGangUnitDeploy and PolicBudgPerPop have been dropped because they have more than 80% missing values.

In that paper, ViolentCrimesPerPop is the response variable, racepctblack and PctForeignBorn are the sensitive attributes and the remaining variables are used as predictors.

The data contain too many variable to list them here: we refer the reader to the documentation on the UCI Machine Learning Repository.

References

UCI Machine Learning Repository:
http://archive.ics.uci.edu/ml/datasets/communities+and+crime

Examples

data(communities.and.crime)

# short-hand variable names.
cc = communities.and.crime[complete.cases(communities.and.crime), ]
r = cc[, "ViolentCrimesPerPop"]
s = cc[, c("racepctblack", "PctForeignBorn")]
p = cc[, setdiff(names(cc), c("ViolentCrimesPerPop", names(s)))]

m = nclm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)

m = frrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)

[Package fairml version 0.8 Index]