synthetic data sets {fairml} | R Documentation |
Synthetic data set to test fair models
Description
Synthetic data set used as test cases in the fairml package.
Usage
data(vu.test)
Format
The data are stored a list with following three elements:
-
gaussian
,binomial
,poisson
,coxph
andmultinomial
are response variables for the different families; -
X
, a numeric matrix containing 3 predictors calledX1
,X2
andX3
; -
S
, a numeric matrix containing 3 sensitive attributes calledS1
,S2
andS3
.
Note
This data set is called vu.test
because it is generated from
very unfair models in which sensitive attributes explain the
lion's share of the overall explained variance or deviance.
The code used to generate the predictors and the sensitive attributes is as follows.
library(mvtnorm) sigma = matrix(0.3, nrow = 6, ncol = 6) diag(sigma) = 1 n = 1000 X = rmvnorm(n, mean = rep(0, 6), sigma = sigma) S = X[, 4:6] X = X[, 1:3] colnames(X) = c("X1", "X2", "X3") colnames(S) = c("S1", "S2", "S3")
The continuous response in gaussian
is produced as follows.
gaussian = 2 + 2 * X[, 1] + 3 * X[, 2] + 4 * X[, 3] + 5 * S[, 1] + 6 * S[, 2] + 7 * S[, 3] + rnorm(n, sd = 10)
The discrete response in binomial
is produced as follows.
nu = 1 + 0.5 * X[, 1] + 0.6 * X[, 2] + 0.7 * X[, 3] + 0.8 * S[, 1] + 0.9 * S[, 2] + 1.0 * S[, 3] binomial = rbinom(n = nrow(X), size = 1, prob = exp(nu) / (1 + exp(nu))) binomial = as.factor(binomial)
The log-linear response in poisson
is produced as follows.
nu = 1 + 0.5 * X[, 1] + 0.6 * X[, 2] + 0.7 * X[, 3] + 0.8 * S[, 1] + 0.9 * S[, 2] + 1.0 * S[, 3] poisson = rpois(n = nrow(X), lambda = exp(nu))
The response for the Cox proportional hazards coxph
is
produced as follows.
fx = 1 + 0.5 * X[, 1] + 0.6 * X[, 2] + 0.7 * X[, 3] + 0.8 * S[, 1] + 0.9 * S[, 2] + 1.0 * S[, 3] hx = exp(fx) ty = rexp(length(fx), hx) tcens = rbinom(n = length(fx), prob = 0.3, size = 1) coxph = cbind(time = ty, status = 1 - tcens)
The discrete response in multinomial
is produced as follows.
nu1 = 1 + 0.5 * X[, 1] + 0.6 * X[, 2] + 0.7 * X[, 3] + 0.8 * S[, 1] + 0.9 * S[, 2] + 1.0 * S[, 3] nu2 = 1 + 0.2 * X[, 1] + 0.2 * X[, 2] + 0.2 * X[, 3] + 0.6 * S[, 1] + 0.6 * S[, 2] + 0.6 * S[, 3] nu3 = 1 + 0.7 * X[, 1] + 0.6 * X[, 2] + 0.5 * X[, 3] + 0.1 * S[, 1] + 0.1 * S[, 2] + 0.1 * S[, 3] nu4 = 1 + 0.4 * X[, 1] + 0.4 * X[, 2] + 0.4 * X[, 3] + 0.4 * S[, 1] + 0.4 * S[, 2] + 0.4 * S[, 3] norm = exp(nu1) + exp(nu2) + exp(nu3) + exp(nu4) probs = matrix(c(exp(nu1) / norm, exp(nu2) / norm, exp(nu3) / norm, exp(nu4) / norm), ncol = 4, byrow = FALSE) multinomial = apply(probs, MARGIN = 1, function(x) sample(letters[1:4], size = 1, prob = x)) multinomial = factor(multinomial, labels = letters[1:4])
Author(s)
Marco Scutari
Examples
summary(fgrrm(response = vu.test$gaussian, predictors = vu.test$X,
sensitive = vu.test$S, unfairness = 1, family = "gaussian"))
summary(fgrrm(response = vu.test$binomial, predictors = vu.test$X,
sensitive = vu.test$S, unfairness = 1, family = "binomial"))
summary(fgrrm(response = vu.test$poisson, predictors = vu.test$X,
sensitive = vu.test$S, unfairness = 1, family = "poisson"))
summary(fgrrm(response = vu.test$coxph, predictors = vu.test$X,
sensitive = vu.test$S, unfairness = 1, family = "cox"))
summary(fgrrm(response = vu.test$multinomial, predictors = vu.test$X,
sensitive = vu.test$S, unfairness = 1, family = "multinomial"))