adult {mlr3fairness}R Documentation

Adult Dataset

Description

Dataset used to predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset Train dataset contains 13 features and 30178 observations. Test dataset contains 13 features and 15315 observations. Target column is "target": A binary factor where 1: <=50K and 2: >50K for annual income. The column "sex" is set as protected attribute.

Derived tasks

Using Adult - Known Problems

The adult dataset has several known limitations such as its age, limited documentation, and outdated feature encodings (Ding et al., 2021). Furthermore, the selected threshold (income <=50K) has strong implications on the outcome of analysis, such that "In many cases, the $50k threshold understates and misrepresents the broader picture" (Ding et al., 2021). As a result, conclusions w.r.t. real-world implications are severely limited.

We decide to replicate the dataset here, as it is a widely used benchmark dataset and it can still serve this purpose.

Pre-processing

Metadata

Source

Dua, Dheeru, Graff, Casey (2017). “UCI Machine Learning Repository.” http://archive.ics.uci.edu/ml/. Ding, Frances, Hardt, Moritz, Miller, John, Schmidt, Ludwig (2021). “Retiring adult: New datasets for fair machine learning.” In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).

Examples

library("mlr3")
data("adult_test", package = "mlr3fairness")
data("adult_train", package = "mlr3fairness")

[Package mlr3fairness version 0.3.2 Index]