compas {fairness} | R Documentation |
Modified COMPAS dataset
Description
compas
is a landmark dataset to study algorithmic (un)fairness. This data was used to
predict recidivism (whether a criminal will reoffend or not) in the USA. The tool was meant to overcome
human biases and offer an algorithmic, fair solution to predict recidivism in a diverse population.
However, the algorithm ended up propagating existing social biases and thus, offered an unfair algorithmic
solution to the problem. In this dataset, a model to predict recidivism has already been fit and predicted
probabilities and predicted status (yes/no) for recidivism have been concatenated to the original data.
Usage
compas
Format
A data frame with 6172 rows and 9 variables:
- Two_yr_Recidivism
factor, yes/no for recidivism or no recidivism. This is the outcome or target in this dataset
- Number_of_Priors
numeric, number of priors, normalized to mean = 0 and standard deviation = 1
- Age_Above_FourtyFive
factor, yes/no for age above 45 years or not
- Age_Below_TwentyFive
factor, yes/no for age below 25 years or not
- Female
factor, female/male for gender
- Misdemeanor
factor, yes/no for having recorded misdemeanor(s) or not
- ethnicity
factor, Caucasian, African American, Asian, Hispanic, Native American or Other
- probability
numeric, predicted probabilities for recidivism, ranges from 0 to 1
- predicted
numeric, predicted values for recidivism, 0/1 for no/yes
Source
The dataset is downloaded from Kaggle https://www.kaggle.com/danofer/compass and has undergone modifications (e.g. ethnicity was originally encoded using one-hot encoding, number or priors have been normalized, variables have been renamed, prediction model was fit and predicted probabilities and predicted status were concatenated to the original dataset).