bank {fairml} | R Documentation |
Bank Marketing
Description
Direct marketing campaigns (phone calls) of a Portuguese banking institution to make clients subscribe a term deposit.
Usage
data(bank)
Format
The data contains 41188 observations and 19 variables. See the UCI Machine Learning Repository for details.
Note
The data set has been pre-processed as in Zafar et al. (2019), with the following exceptions:
the variable
duration
has been dropped in order to learn as realistic predictive model;the variable
pdays
has been dropped because it is not defined for the vast majority of samples;observations where
loan
is"unknown"
have been dropped because the corresponding regression coefficient estimated byglm()
isNA
;the three observations where
default
is"yes"
have been dropped to avoid errors in cross-validation (if all those three observations are in the test fold it is impossible to compute predictions from them).
In that paper, subscribed
is the response variable, age
is the
sensitive attribute and the remaining variables are used as predictors.
The data contains the following variables:
-
age
as a numeric variable; -
job
, a factor with 12 levels ranging from"blue-collar"
to"services"
; -
marital
, a factor with levels"divorced"
,"married"
,"single"
and"unknown"
; -
education
, a factor with 8 levels ranging from"basic.4y"
to"university.degree"
; -
default
, a factor with levels"no"
and"unknown"
; -
housing
, a factor with levels"yes"
and"no"
; -
loan
, a factor with levels"yes"
and"no"
; -
contact
, a factor with levels"cellular"
and"telephone"
; -
month
, a factor with 12 levels for the months of the year; -
day_of_week
, a factor with 7 levels for the days of the week; -
campaign
, the number of contacts performed during this campaign; -
previous
, the number of contacts performed before this campaign; -
poutcome
, a factor with levels"failure"
,"nonexistent"
and"success"
; -
emp_var_rate
, the (numeric) quarterly employment variation rate; -
cons_price_idx
, the (numeric) monthly consumer price index; -
cons_conf_idx
, the (numeric) monthly consumer confidence index; -
euribor3m
, the (numeric) euribor 3-month rate; -
nr_employed
, a numeric variable with the number of employees in the company in that quarter; -
subscribed
, a factor with levels"yes"
and"no"
.
References
UCI Machine Learning Repository.
https://archive.ics.uci.edu/ml/datasets/bank+marketing
Examples
data(bank)
# remove loans with unknown status, the corresponding coefficient is NA in glm().
bank = bank[bank$loan != "unknown", ]
# short-hand variable names.
r = bank[, "subscribed"]
s = bank[, c("age")]
p = bank[, setdiff(names(bank), c("subscribed", "age"))]
m = zlrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)