roc_method {predfairness}R Documentation

Reject Option based Classification method

Description

Reject Option based Classification (ROC) method for discrimination reduction in predictive models. Given a probabilistic model for binary classifications, ROC is a post processing method which changes instances' classification labels in a defined probability interval. In a probabilistic model, the decision criteria is defined by simply choosing the category with the higher estimated probability. Considering a binary classification, a model returns two complementary probabilities.

Assuming that discriminatory classifications occurs near rejection boundary (when probabilities are near 0.5), the ROC method defines an interval in which probabilities can be considered next to the boundary. Then, once the interval size ([0.5, theta]) is defined, the method looks for the higher probability between the two classes. If a privileged person receives a positive classification with probability between 0.5 and theta, the method turn this classification to negative. Conversely, if the method finds a deprived person with negative classification probability between 0.5 and theta, then it changes her to positive.

Usage

roc_method(
  pred_mod,
  positive_col,
  positive_class,
  negative_col,
  sensible_col,
  privileged_group,
  classification_col,
  theta
)

Arguments

pred_mod

data frame - predictions and its probabilities with respect to each category.

positive_col

string - positive classification probabilities column name

positive_class

string - positive classification label

negative_col

string - negative classification probabilities column name

sensible_col

string - sensible attribute column name

privileged_group

string - privileged group label

classification_col

string - classifications column name

theta

numeric - classification probabilities threshold

Details

In a binary classification, the highest probability is always greater than 0.5. Considering already classified instances, and selecting people from the privileged-positive classified group and deprived-negative classified group, the method searches for those with the maximum probability less than theta. In this case, the function will change the instance's classification label and replace the two probabilities with their complementary. The user must run the data frame with predictions, the column name with the sensible attribute, as well as the privileged group name through the ROC method. Also, the user must add the classification column name, the categories probabilities columns names and the name of the category considered the positive one. This function returns a data frame with updated probabilities and classifications.

Value

Returns a new data frame with updated classifications and probabilities, maintaining the structure (columns and its names) of the original data frame, ran in the method.

Author(s)

Leonardo Paes Vieira

References

F. Kamiran, A. Karim and X. Zhang, "Decision Theory for Discrimination-Aware Classification," 2012 IEEE 12th International Conference on Data Mining, 2012, pp. 924-929, doi: 10.1109/ICDM.2012.45.

Examples


 data('adult.data')

 adult.data$income = ifelse(test =  adult.data$income == '>50K',
                             yes = 1, no = 0)

 adult.data = adult.data[, colnames(adult.data) %in%
                           c('age', 'education', 'sex',
                             'income', 'capital_gain')]

 adult.data = adult.data[sample(1:nrow(adult.data), size = 100, replace = FALSE), ]

 ##### Logistic Regression

 if (!requireNamespace("stats", quietly = TRUE)) {
 stop("Package \"stats\" needed for this example to work.",
 call. = FALSE)}

 mod = glm(formula =  income ~., data = adult.data, family = binomial(link = 'logit'))

 ### The 'predict' function returns the classes probabilities
 ### automatically for caret (package) models

 pred = data.frame(greater = mod$fitted.values, less =  1 - mod$fitted.values, sex = adult.data$sex,
                   classification = ifelse(mod$fitted.values >= 0.5, 'greater', 'less'))

 theta = 0.6

 pred_changed = roc_method(pred_mod = pred, positive_col = 'greater',
                          positive_class = 'greater', negative_col = 'less',
                          sensible_col = 'sex', privileged_group = 'Male',
                          classification_col = 'classification',
                          theta = theta)


 pred_changed

[Package predfairness version 0.1.0 Index]