compas {mlr3fairness}R Documentation

COMPAS Dataset

Description

The COMPAS dataset includes the processed COMPAS data between 2013-2014. The data cleaning process followed the guidance in the original COMPAS repo. Contains 6172 observations and 14 features. The target column could either be "is_recid" or "two_year_recid", but often "two_year_recid" is prefered. The column "sex" is set as protected attribute, but more often "race" is used.

Derived tasks:

Format

R6::R6Class inheriting from TaskClassif.

R6::R6Class inheriting from TaskClassif.

Using COMPAS - Known Problems

The COMPAS dataset was collected as part of the ProPublica analysis of machine bias in criminal sentencing. It is important to note, that using COMPAS is generally discouraged for the following reasons:

The dataset should therefore not be used to benchmark new fairness algorithms or measures. For a more in-depth treatment, see Bao et al., 2021: It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks. We replicate the dataset here to raise awareness for this issue. Furthermore, similar issues exist across a wide variety of datasets widely used in the context of fairness auditing and we, therefore, consider issues, e.g. derived from disparate measurement bias an important issue in the context of fairness audits.

Pre-processing

Metadata

Construction

mlr_tasks$get("compas")
tsk("compas")
mlr_tasks$get("compas_race_binary")
tsk("compas_race_binary")

Source

ProPublica Analysis: https://github.com/propublica/compas-analysis

Bao, Michelle, Zhou, Angela, Zottola, A S, Brubach, Brian, Desmarais, Sarah, Horowitz, Seth A, Lum, Kristian, Venkatasubramanian, Suresh (2021). “It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks.” In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).

Examples

library("mlr3")
data("compas", package = "mlr3fairness")

[Package mlr3fairness version 0.3.2 Index]