fairadapt {fairadapt}R Documentation

Fairadapt

Description

Implementation of fair data adaptation with quantile preservation (Plecko & Meinshausen 2019). Uses only plain R.

Usage

fairadapt(
  formula,
  prot.attr,
  adj.mat,
  train.data,
  test.data = NULL,
  cfd.mat = NULL,
  top.ord = NULL,
  res.vars = NULL,
  quant.method = rangerQuants,
  visualize.graph = FALSE,
  eval.qfit = NULL,
  ...
)

Arguments

formula

Object of class formula describing the response and the covariates.

prot.attr

A value of class character describing the binary protected attribute. Must be one of the entries of colnames(adj.mat).

adj.mat

Matrix of class matrix encoding the relationships in the causal graph. M[i,j] == 1L implies the existence of an edge from node i to node j. Must include all the variables appearing in the formula object. When the adj.mat argument is set to NULL, then the top.ord argument has to be supplied.

train.data, test.data

Training data & testing data, both of class data.frame. Test data is by default NULL.

cfd.mat

Symmetric matrix of class matrix encoding the bidirected edges in the causal graph. ⁠M[i,j] == M[j, i] == 1L⁠ implies the existence of a bidirected edge between nodes i and j. Must include all the variables appearing in the formula object.

top.ord

A vector of class character describing the topological ordering of the causal graph. Default value is NULL, but this argument must be supplied if adj.mat is not specified. Also must include all the variables appearing in the formula object.

res.vars

A vector of class character listing all the resolving variables, which should not be changed by the adaption procedure. Default value is NULL, corresponding to no resolving variables. Resolving variables should be a subset of the descendants of the protected attribute.

quant.method

A function choosing the method used for quantile regression. Default value is rangerQuants (using random forest quantile regression). Other implemented options are linearQuants and mcqrnnQuants. A custom function can be supplied by the user here, and the associated method for the S3 generic computeQuants needs to be added.

visualize.graph

A logical indicating whether the causal graph should be plotted upon calling the fairadapt() function. Default value is FALSE.

eval.qfit

Argument indicating whether the quality of the quantile regression fit should be computed using cross-validation. Default value is NULL, but whenever a positive integer value is specified, then it is interpreted as the number of folds used in the cross-validation procedure.

...

Additional arguments forwarded to the function passed as quant.method.

Details

The procedure takes the training and testing data as an input, together with the causal graph given by an adjacency matrix and the list of resolving variables, which should be kept fixed during the adaptation procedure. The procedure then calculates a fair representation of the data, after which any classification method can be used. There are, however, several valid training options yielding fair predictions, and the best of them can be chosen with cross-validation. For more details we refer the user to the original paper. Most of the running time is due to the quantile regression step using the ranger package.

Value

An object of class fairadapt, containing the original and adapted training and testing data, together with the causal graph and some additional meta-information.

References

Plecko, D. & Meinshausen, N. (2019). Fair Data Adaptation with Quantile Preservation

Examples

n_samp <- 200
uni_dim <- c(       "gender", "edu", "test", "score")
uni_adj <- matrix(c(       0,     1,      1,       0,
                           0,     0,      1,       1,
                           0,     0,      0,       1,
                           0,     0,      0,       0),
                  ncol = length(uni_dim),
                  dimnames = rep(list(uni_dim), 2),
                  byrow = TRUE)

uni_ada <- fairadapt(score ~ .,
  train.data = head(uni_admission, n = n_samp),
  test.data = tail(uni_admission, n = n_samp),
  adj.mat = uni_adj,
  prot.attr = "gender"
)

uni_ada


[Package fairadapt version 0.2.7 Index]