communities.and.crime {fairml} | R Documentation |
Communities and Crime Data Set
Description
Combined socio-economic data from the 1990 Census, law enforcement data from the 1990 LEMAS survey, and crime data from the 1995 FBI UCR for various communities in the United States.
Usage
data(communities.and.crime)
Format
The data contains 1969 observations and 104 variables. See the UCI Machine Learning Repository for details.
Note
The data set has been pre-processed as in Komiyama et al. (2018), with the following exceptions:
the variable
community
has been dropped, as it is non-predictive and contains a sizeable number of missing values;the variables
LemasSwornFT
,LemasSwFTPerPop
,LemasSwFTFieldOps
,LemasSwFTFieldPerPop
,LemasTotalReq
,LemasTotReqPerPop
,PolicReqPerOffic
,PolicPerPop
,RacialMatchCommPol
,PctPolicWhite
,PctPolicBlack
,PctPolicHisp
,PctPolicAsian
,PctPolicMinor
,OfficAssgnDrugUnits
,NumKindsDrugsSeiz
,PolicAveOTWorked
,PolicCars
,PolicOperBudg
,LemasPctPolicOnPatr
,LemasGangUnitDeploy
andPolicBudgPerPop
have been dropped because they have more than 80% missing values.
In that paper, ViolentCrimesPerPop
is the response variable,
racepctblack
and PctForeignBorn
are the sensitive attributes and
the remaining variables are used as predictors.
The data contain too many variable to list them here: we refer the reader to the documentation on the UCI Machine Learning Repository.
References
UCI Machine Learning Repository:
http://archive.ics.uci.edu/ml/datasets/communities+and+crime
Examples
data(communities.and.crime)
# short-hand variable names.
cc = communities.and.crime[complete.cases(communities.and.crime), ]
r = cc[, "ViolentCrimesPerPop"]
s = cc[, c("racepctblack", "PctForeignBorn")]
p = cc[, setdiff(names(cc), c("ViolentCrimesPerPop", names(s)))]
m = nclm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
m = frrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)