fairml.cv {fairml}R Documentation

Cross-Validation for Fair Models

Description

Cross-validation for the models in the fairml package.

Usage

fairml.cv(response, predictors, sensitive, method = "k-fold", ..., unfairness,
  model, model.args = list(), cluster)

cv.loss(x)
cv.unfairness(x)
cv.folds(x)

Arguments

response

a numeric vector, the response variable.

predictors

a numeric matrix or a data frame containing numeric and factor columns; the predictors.

sensitive

a numeric matrix or a data frame containing numeric and factor columns; the sensitive attributes.

method

a character string, either k-fold, custom-folds or hold-out. See below for details.

...

additional arguments for the cross-validation method.

unfairness

a positive number in [0, 1], the proportion of the explained variance that can be attributed to the sensitive attributes.

model

a character string, the label of the model. Currently "nclm", "frrm", "fgrrm", "zlm" and "zlrm" are available.

model.args

additional arguments passed to model estimation.

cluster

an optional cluster object from package parallel, to process folds or subsamples in parallel.

x

an object of class fair.kcv or fair.kcv.list.

Details

The following cross-validation methods are implemented:

Cross-validation methods accept the following optional arguments:

If cross-validation is used with multiple runs, the overall loss is the average of the loss estimates from the different runs.

The predictive performance of the models is measured using the mean square error as the loss function.

Value

fairml.cv() returns an object of class fair.kcv.list if runs is at least 2, an object of class fair.kcv if runs is equal to 1.

cv.loss() returns a numeric vector or a numeric matrix containing the values of the loss function computed for each run of cross-validation.

cv.unfairness() returns a numeric vectors containing the values of the unfairness criterion computed on the validation folds for each run of cross-validation.

cv.folds() returns a list containing the indexes of the observations in each of the cross-validation folds. In the case of k-fold cross-validation, if runs is larger than 1, each element of the list is itself a list with the indexes for the observations in each fold in each run.

Author(s)

Marco Scutari

Examples

kcv = fairml.cv(response = vu.test$gaussian, predictors = vu.test$X,
        sensitive = vu.test$S, unfairness = 0.10, model = "nclm",
        method = "k-fold", k = 10, runs = 10)
kcv
cv.loss(kcv)
cv.unfairness(kcv)

# run a second cross-validation with the same folds.
fairml.cv(response = vu.test$gaussian, predictors = vu.test$X,
        sensitive = vu.test$S, unfairness = 0.10, model = "nclm",
        method = "custom-folds", folds = cv.folds(kcv))

# run cross-validation in parallel.
## Not run: 
library(parallel)
cl = makeCluster(2)
fairml.cv(response = vu.test$gaussian, predictors = vu.test$X,
  sensitive = vu.test$S, unfairness = 0.10, model = "nclm",
  method = "k-fold", k = 10, runs = 10, cluster = cl)
stopCluster(cl)

## End(Not run)

[Package fairml version 0.8 Index]