fairness.vld {PDtoolkit}R Documentation

Model fairness validation

Description

fairness.vld performs fairness validation for a given sensitive attribute and selected outcome. Sensitive attribute should be categorical variable with reasonable number of modalities, while outcome can be categorical (e.g. reject/accept indicator or rating grade) or continuous (e.g. interest rate or amount). Depending on model type outcome (see argument mod.outcome.type) Chi-square test or Wald test is applied.

Usage

fairness.vld(
  db,
  sensitive,
  obs.outcome,
  mod.outcome,
  conditional = NULL,
  mod.outcome.type,
  p.value
)

Arguments

db

Data frame with sensitive attribute, observed outcome, model outcome and conditional attribute.

sensitive

Name of sensitive attribute within db.

obs.outcome

Name of observed outcome within db.

mod.outcome

Name of model outcome within db.

conditional

Name of conditional attribute within db. It is used for calculation of conditional statistical parity. Default value is NULL.

mod.outcome.type

Type of model outcome. Possible values are disc (discrete outcome) and cont (continuous).

p.value

Significance level of applied statistical test (chi-square or Wald test).

Value

The command fairness.vld returns a list of three data frames.
The first object (SP), provides results of statistical parity testing.
The second object (CSP), provides results of conditional statistical parity testing. This object will be returned only if conditional attributed is supplied.
The third object (EO), provides results of equal opportunity testing.

References

Hurlin, Christophe and Perignon, Christophe and Saurin, Sebastien (2022), The Fairness of Credit Scoring Models. HEC Paris Research Paper No. FIN-2021-1411

Examples

suppressMessages(library(PDtoolkit))
#build hypothetical model
data(loans)
#numeric risk factors
#num.rf <- sapply(loans, is.numeric)
#num.rf <- names(num.rf)[!names(num.rf)%in%"Creditability" & num.rf]
num.rf <- c("Credit Amount", "Age (years)")
#discretized numeric risk factors using ndr.bin from monobin package
loans[, num.rf] <- sapply(num.rf, function(x) 
ndr.bin(x = loans[, x], y = loans[, "Creditability"])[[2]])
str(loans)
#run stepMIV
rf <- c("Account Balance", "Payment Status of Previous Credit", 
       "Purpose", "Value Savings/Stocks", "Credit Amount",
       "Age (years)", "Instalment per cent", "Foreign Worker")
res <- stepMIV(start.model = Creditability ~ 1, 
   miv.threshold = 0.02, 
   m.ch.p.val = 0.05,
   coding = "WoE",
   coding.start.model = FALSE,
   db = loans[, c("Creditability", rf)])
#print coefficients
summary(res$model)$coefficients

#prepare data frame for fairness validation
db.fa <- data.frame(Creditability = loans$Creditability, 
		  mpred = predict(res$model, type = "response", newdata = res$dev.db))
#add hypothetical reject/accept indicator 
db.fa$rai <- ifelse(db.fa$mpred > 0.5, 1, 0)
#add hypothetical rating
db.fa$rating <- sts.bin(x = round(db.fa$mpred, 4), y = db.fa$Creditability)[[2]]
#add hypothetical interest rate
ir.r <- seq(0.03, 0.10, length.out = 6)
names(ir.r) <- sort(unique(db.fa$rating))
db.fa$ir <- ir.r[db.fa$rating]
#add hypothetical sensitive attribute
db.fa$sensitive.1 <- ifelse(loans$"Sex & Marital Status"%in%2, 1, 0) #not in a model
db.fa$sensitive.2 <- ifelse(loans$"Age (years)"%in%"03 [35,Inf)", 1, 0) #in a model
#add some attributes for calculation of conditional statistical parity
db.fa$"Credit Amount" <- loans$"Credit Amount" 
head(db.fa)

#discrete model outcome - sensitive attribute not in a model
fairness.vld(db = db.fa, 
	 sensitive = "sensitive.1", 
	 obs.outcome = "Creditability", 
	 mod.outcome = "rai",
	 conditional = "Credit Amount", 
	 mod.outcome.type = "disc", 
	 p.value = 0.05)
##discrete model outcome - sensitive attribute in a model
#fairness.vld(db = db.fa, 
#		 sensitive = "sensitive.2", 
#		 obs.outcome = "Creditability", 
#		 mod.outcome = "rai",
#		 conditional = "Credit Amount", 
#		 mod.outcome.type = "disc", 
#		 p.value = 0.05)
##continuous outcome - sensitive attribute not in a model
#fairness.vld(db = db.fa, 
#		 sensitive = "sensitive.1", 
#		 obs.outcome = "Creditability", 
#		 mod.outcome = "ir",
#		 conditional = "Credit Amount", 
#		 mod.outcome.type = "cont", 
#		 p.value = 0.05)
#continuous outcome - sensitive attribute in a model
fairness.vld(db = db.fa, 
	 sensitive = "sensitive.2", 
	 obs.outcome = "Creditability", 
	 mod.outcome = "ir",
	 conditional = "Credit Amount", 
	 mod.outcome.type = "cont", 
	 p.value = 0.05)

[Package PDtoolkit version 1.2.0 Index]