R: Fairadapt

fairadapt {fairadapt}

R Documentation

Fairadapt

Description

Implementation of fair data adaptation with quantile preservation (Plecko & Meinshausen 2019). Uses only plain R.

Usage

fairadapt(
  formula,
  prot.attr,
  adj.mat,
  train.data,
  test.data = NULL,
  cfd.mat = NULL,
  top.ord = NULL,
  res.vars = NULL,
  quant.method = rangerQuants,
  visualize.graph = FALSE,
  eval.qfit = NULL,
  ...
)

Arguments

`formula`	Object of class `formula` describing the response and the covariates.
`prot.attr`	A value of class `character` describing the binary protected attribute. Must be one of the entries of `colnames(adj.mat)`.
`adj.mat`	Matrix of class `matrix` encoding the relationships in the causal graph. `M[i,j] == 1L` implies the existence of an edge from node i to node j. Must include all the variables appearing in the formula object. When the `adj.mat` argument is set to `NULL`, then the `top.ord` argument has to be supplied.
`train.data`, `test.data`	Training data & testing data, both of class `data.frame`. Test data is by default `NULL`.
`cfd.mat`	Symmetric matrix of class `matrix` encoding the bidirected edges in the causal graph. `⁠M[i,j] == M[j, i] == 1L⁠` implies the existence of a bidirected edge between nodes i and j. Must include all the variables appearing in the formula object.
`top.ord`	A vector of class `character` describing the topological ordering of the causal graph. Default value is `NULL`, but this argument must be supplied if `adj.mat` is not specified. Also must include all the variables appearing in the formula object.
`res.vars`	A vector of class `character` listing all the resolving variables, which should not be changed by the adaption procedure. Default value is `NULL`, corresponding to no resolving variables. Resolving variables should be a subset of the descendants of the protected attribute.
`quant.method`	A function choosing the method used for quantile regression. Default value is `rangerQuants` (using random forest quantile regression). Other implemented options are `linearQuants` and `mcqrnnQuants`. A custom function can be supplied by the user here, and the associated method for the S3 generic `computeQuants` needs to be added.
`visualize.graph`	A `logical` indicating whether the causal graph should be plotted upon calling the `fairadapt()` function. Default value is `FALSE`.
`eval.qfit`	Argument indicating whether the quality of the quantile regression fit should be computed using cross-validation. Default value is `NULL`, but whenever a positive integer value is specified, then it is interpreted as the number of folds used in the cross-validation procedure.
`...`	Additional arguments forwarded to the function passed as `quant.method`.

Details

The procedure takes the training and testing data as an input, together with the causal graph given by an adjacency matrix and the list of resolving variables, which should be kept fixed during the adaptation procedure. The procedure then calculates a fair representation of the data, after which any classification method can be used. There are, however, several valid training options yielding fair predictions, and the best of them can be chosen with cross-validation. For more details we refer the user to the original paper. Most of the running time is due to the quantile regression step using the ranger package.

Value

An object of class fairadapt, containing the original and adapted training and testing data, together with the causal graph and some additional meta-information.

References

Plecko, D. & Meinshausen, N. (2019). Fair Data Adaptation with Quantile Preservation

Examples

n_samp <- 200
uni_dim <- c(       "gender", "edu", "test", "score")
uni_adj <- matrix(c(       0,     1,      1,       0,
                           0,     0,      1,       1,
                           0,     0,      0,       1,
                           0,     0,      0,       0),
                  ncol = length(uni_dim),
                  dimnames = rep(list(uni_dim), 2),
                  byrow = TRUE)

uni_ada <- fairadapt(score ~ .,
  train.data = head(uni_admission, n = n_samp),
  test.data = tail(uni_admission, n = n_samp),
  adj.mat = uni_adj,
  prot.attr = "gender"
)

uni_ada

[Package fairadapt version 0.2.7 Index]