national.longitudinal.survey {fairml} | R Documentation |
Income and Labour Market Activities
Description
Survey results from the U.S. Bureau of Labor Statistics to gather information on the labour market activities and other life events of several groups.
Usage
data(national.longitudinal.survey)
Format
The data contains 4908 observations and the following variables:
-
age
, a numeric variable containing the interviewee's age in years; -
race
, a factor with 20 levels denoting various racial/ethnic origins; -
gender
, a factor with levels"Male"
and"Female"
. -
grade90
, a factor containing the highest completed school grade from "3RD GRADE" to "8TH YR COL OR MORE", with 18 levels; -
income06
, a numeric variable, income in 2006 in 10000-USD units; -
income96
, a numeric variable, income in 1996 in 10000-USD units; -
income90
, a numeric variable, income in 1990 in 10000-USD units; -
partner
, a factor encoding whether the interviewee has a partner, with levels"No"
and"Yes"
; -
height
, a numeric variable, the height of the interviewee; -
weight
, a numeric variable, the weight of the interviewee; -
famsize
, a numeric variable, the number of family members; -
genhealth
, a factor with levels"Excellent"
,"Very Good"
,"Good"
,"Fair"
,"Poor"
encoding the general health status of the interviewee; -
illegalact
, a numeric variable containing the number of illegal acts committed by the interviewee; -
charged
, a numeric variable containing the number of illegal acts for which the interviewee has been charged; -
jobsnum90
, a numeric value, the number of different jobs ever reported; -
afqt89
, a numeric value, the percentile score of the "Profiles, Armed Forces Qualification Test" (AFQT); -
typejob90
, a factor with 13 levels encoding different job types; -
jobtrain90
, a factor with levels"No"
and"Yes"
encoding whether the job was classified as training.
Note
The data set has been pre-processed differently from Komiyama et al. (2018). In particular:
the variables
income96
andincome06
have been retained as alternative responses;the variables
height
,weight
,race
,partner
andfamsize
have been retained;the variables
grade90
andgenhealth
are coded as ordered factors because they do not make sense on a numeric scale.
In that paper, income90
is the response variable, gender
and
age
are the sensitive attributes.
References
U.S. Bureau of Labor Statistics.
https://www.bls.gov/nls/
Examples
data(national.longitudinal.survey)
# short-hand variable names.
nn = national.longitudinal.survey
# remove alternative response variables.
nn = nn[, setdiff(names(nn), c("income96", "income06"))]
# short-hand variable names.
r = nn[, "income90"]
s = nn[, c("gender", "age")]
p = nn[, setdiff(names(nn), c("income90", "gender", "age"))]
m = nclm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
m = frrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)