adult {fairml} | R Documentation |
Census Income
Description
Predict whether income exceeds $50K per year using the U.S. 1994 Census data.
Usage
data(adult)
Format
The data contains 30162 observations and 14 variables. See the UCI Machine Learning Repository for details.
Note
The data set has been pre-processed as in Zafar et al. (2019), with the following exceptions:
the data do not include the test sample from the UCI repository;
the variables
"capital_gain"
and"capital_loss"
have been scaled by1/1000
.
In that paper, income
is the response variable, sex
and
race
are the sensitive attributes and the remaining variables are
used as predictors.
The data contain the following variables:
-
age
as a numeric variable; -
workclass
, a factor with 8 levels encoding the type of employment ("Private"
,"Self-emp-not-inc"
,"Federal-gov"
, etc.); -
education
, a factor with 10 levels from"Preschool"
to"Doctorate"
; -
education-num
, the number of years in education; -
marital-status
, a factor with 7 levels from"Married-civ-spouse"
to"Divorced"
and"Never-married"
; -
occupation
, a factor with 14 levels encoding the field of employment ("Tech-support"
,"Craft-repair"
, etc.); -
relationship
a factor with 6 levels ("Wife"
,"Own-child"
, etc.); -
race
, a factor with levels"White"
,"Asian-Pac-Islander"
,"Amer-Indian-Eskimo"
,"Other"
and"Black"
; -
sex
, a factor with levels"Female"
and"Male"
; -
capital-gain
as a numeric variable; -
capital-loss
as a numeric variable; -
native-country
as a factor with two levels"United-States"
and"Non-United-States"
; -
hours-per-week
as a numeric variable.
References
UCI Machine Learning Repository.
https://archive.ics.uci.edu/ml/datasets/adult
Examples
data(adult)
# short-hand variable names.
r = adult[, "income"]
s = adult[, c("sex", "race")]
p = adult[, setdiff(names(adult), c("income", "sex", "race"))]
## Not run:
m = zlrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
## End(Not run)