R: Census Income

adult {fairml}

R Documentation

Census Income

Description

Predict whether income exceeds $50K per year using the U.S. 1994 Census data.

Usage

data(adult)

Format

The data contains 30162 observations and 14 variables. See the UCI Machine Learning Repository for details.

Note

The data set has been pre-processed as in Zafar et al. (2019), with the following exceptions:

the data do not include the test sample from the UCI repository;
the variables "capital_gain" and "capital_loss" have been scaled by 1/1000.

In that paper, income is the response variable, sex and race are the sensitive attributes and the remaining variables are used as predictors.

The data contain the following variables:

age as a numeric variable;
workclass, a factor with 8 levels encoding the type of employment ("Private", "Self-emp-not-inc", "Federal-gov", etc.);
education, a factor with 10 levels from "Preschool" to "Doctorate";
education-num, the number of years in education;
marital-status, a factor with 7 levels from "Married-civ-spouse" to "Divorced" and "Never-married";
occupation, a factor with 14 levels encoding the field of employment ("Tech-support", "Craft-repair", etc.);
relationship a factor with 6 levels ("Wife", "Own-child", etc.);
race, a factor with levels "White", "Asian-Pac-Islander", "Amer-Indian-Eskimo", "Other" and "Black";
sex, a factor with levels "Female" and "Male";
capital-gain as a numeric variable;
capital-loss as a numeric variable;
native-country as a factor with two levels "United-States" and "Non-United-States";
hours-per-week as a numeric variable.

References

UCI Machine Learning Repository.
https://archive.ics.uci.edu/ml/datasets/adult

Examples

data(adult)

# short-hand variable names.
r = adult[, "income"]
s = adult[, c("sex", "race")]
p = adult[, setdiff(names(adult), c("income", "sex", "race"))]

## Not run: 
m = zlrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)

## End(Not run)

[Package fairml version 0.8 Index]