fairml.cv {fairml} | R Documentation |
Cross-Validation for Fair Models
Description
Cross-validation for the models in the fairml package.
Usage
fairml.cv(response, predictors, sensitive, method = "k-fold", ..., unfairness,
model, model.args = list(), cluster)
cv.loss(x)
cv.unfairness(x)
cv.folds(x)
Arguments
response |
a numeric vector, the response variable. |
predictors |
a numeric matrix or a data frame containing numeric and factor columns; the predictors. |
sensitive |
a numeric matrix or a data frame containing numeric and factor columns; the sensitive attributes. |
method |
a character string, either |
... |
additional arguments for the cross-validation |
unfairness |
a positive number in [0, 1], the proportion of the explained variance that can be attributed to the sensitive attributes. |
model |
a character string, the label of the model. Currently
|
model.args |
additional arguments passed to model estimation. |
cluster |
an optional cluster object from package parallel, to process folds or subsamples in parallel. |
x |
an object of class |
Details
The following cross-validation methods are implemented:
-
k-fold: the data are split in
k
subsets of equal size. For each subset in turn,model
is fitted on the otherk - 1
subsets and the loss function is then computed using that subset. Loss estimates for each of thek
subsets are then combined to give an overall loss for data. -
custom-folds: the data are manually partitioned by the user into subsets, which are then used as in k-fold cross-validation. Subsets are not constrained to have the same size, and every observation must be assigned to one subset.
-
hold-out:
k
subsamples of sizem
are sampled independently without replacement from the data. For each subsample,model
is fitted on the remainingm - length(response)
samples and the loss function is computed on them
observations in the subsample. The overall loss estimate is the average of thek
loss estimates from the subsamples.
Cross-validation methods accept the following optional arguments:
-
k
: a positive integer number, the number of groups into which the data will be split (in k-fold cross-validation) or the number of times the data will be split in training and test samples (in hold-out cross-validation). -
m
: a positive integer number, the size of the test set in hold-out cross-validation. -
runs
: a positive integer number, the number of times k-fold or hold-out cross-validation will be run. -
folds
: a list in which element corresponds to one fold and contains the indices for the observations that are included to that fold; or a list with an element for each run, in which each element is itself a list of the folds to be used for that run.
If cross-validation is used with multiple runs
, the overall loss is the
average of the loss estimates from the different runs.
The predictive performance of the models is measured using the mean square error as the loss function.
Value
fairml.cv()
returns an object of class fair.kcv.list
if
runs
is at least 2, an object of class fair.kcv
if runs
is equal to 1.
cv.loss()
returns a numeric vector or a numeric matrix containing the
values of the loss function computed for each run of cross-validation.
cv.unfairness()
returns a numeric vectors containing the values of the
unfairness criterion computed on the validation folds for each run of
cross-validation.
cv.folds()
returns a list containing the indexes of the observations in
each of the cross-validation folds. In the case of k-fold cross-validation,
if runs
is larger than 1
, each element of the list is itself a
list with the indexes for the observations in each fold in each run.
Author(s)
Marco Scutari
Examples
kcv = fairml.cv(response = vu.test$gaussian, predictors = vu.test$X,
sensitive = vu.test$S, unfairness = 0.10, model = "nclm",
method = "k-fold", k = 10, runs = 10)
kcv
cv.loss(kcv)
cv.unfairness(kcv)
# run a second cross-validation with the same folds.
fairml.cv(response = vu.test$gaussian, predictors = vu.test$X,
sensitive = vu.test$S, unfairness = 0.10, model = "nclm",
method = "custom-folds", folds = cv.folds(kcv))
# run cross-validation in parallel.
## Not run:
library(parallel)
cl = makeCluster(2)
fairml.cv(response = vu.test$gaussian, predictors = vu.test$X,
sensitive = vu.test$S, unfairness = 0.10, model = "nclm",
method = "k-fold", k = 10, runs = 10, cluster = cl)
stopCluster(cl)
## End(Not run)