compas {fairml} | R Documentation |
Criminal Offenders Screened in Florida
Description
A collection of criminal offenders screened in Florida (US) during 2013-14.
Usage
data(compas)
Format
The data contains 5855 observations and the following variables:
-
age
, a continuous variable containing the age (in years) of the person; -
juv_fel_count
, a continuous variable containing the number of juvenile felonies; -
decile_score
, a continuous variable, the decile of the COMPAS score; -
juv_misd_count
, a continuous variable containing the number of juvenile misdemeanors; -
juv_other_count
, a continuous variable containing the number of prior juvenile convictions that are not considered either felonies or misdemeanors; -
v_decile_score
, a continuous variable containing the predicted decile of the COMPAS score; -
priors_count
, a continuous variable containing the number of prior crimes committed; -
sex
, a factor with levels"Female"
and"Male"
; -
two_year_recid
, a factor with two levels"Yes"
and"No"
(if the person has recidivated within two years); -
race
, a factor encoding the race of the person; -
c_jail_in
, a numeric variable containing the date in which the person entered jail (normalized between 0 and 1); -
c_jail_out
, a numeric variable containing the date in which the person was released from jail (normalized between 0 and 1); -
c_offense_date
, a numeric variable containing the date the offense was committed; -
screening_date
, a numeric variable containing the date in which the person was screened (normalized between 0 and 1); -
in_custody
, a numeric variable containing the date in which the person was placed in custody (normalized between 0 and 1); -
out_custody
, a numeric variable containing the date in which the person was released from custody (normalized between 0 and 1);
Note
The data set has been pre-processed as in Komiyama et al. (2018), with the following exceptions:
the
race
variable has not been reduced to a binary factor with levels"African-American"
and"not African-American"
;the variables
type_of_assessment
,v_type_of_assessment
have been dropped from the analysis because they take the same value for all observations;variables like
c_jail_in
andc_jail_out
that encode dates have been jointly rescaled to preserve the temporal ordering of events.
In that paper, two_year_recid
is the response variable, sex
and
race
are the sensitive attributes and the remaining variables are
used as predictors.
References
Angwin J, Larson J, Mattu S, Kirchner L (2016). "Machine Bias: Theres Software
Used Around the Country to Predict Future Criminals."
https://www.propublica.org
Examples
data(compas)
# convert the response back to a numeric variable.
compas$two_year_recid = as.numeric(compas$two_year_recid) - 1
# short-hand variable names.
r = compas[, "two_year_recid"]
s = compas[, c("sex", "race")]
p = compas[, setdiff(names(compas), c("two_year_recid", "sex", "race"))]
m = nclm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
m = frrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)