R: Permutation Test from Outlier Scores

pt_from_os {dsos}

R Documentation

Permutation Test from Outlier Scores

Description

Test for no adverse shift with outlier scores. Like goodness-of-fit testing, this two-sample comparison takes the training (outlier) scores, os_train, as the reference. The method checks whether the test scores, os_test, are worse off relative to the training set.

Usage

pt_from_os(os_train, os_test, n_pt = 2000)

Arguments

`os_train`	Outlier scores in training (reference) set.
`os_test`	Outlier scores in test set.
`n_pt`	The number of permutations.

Details

The null distribution of the test statistic is based on n_pt permutations. For speed, this is implemented as a sequential Monte Carlo test with the simctest package. See Gandy (2009) for details. The prefix pt refers to permutation test. This approach does not use the asymptotic null distribution for the test statistic. This is the recommended approach for small samples. The test statistic is the weighted AUC (WAUC).

Value

A named list of class outlier.test containing:

statistic: observed WAUC statistic
seq_mct: sequential Monte Carlo test, when applicable
p_value: p-value
outlier_scores: outlier scores from training and test set

Notes

The outlier scores should all mimic out-of-sample behaviour. Mind that the training scores are not in-sample and thus, biased (overfitted) while the test scores are out-of-sample. The mismatch – in-sample versus out-of-sample scores – voids the test validity. A simple fix for this is to get the training scores from an indepedent (fresh) validation set; this follows the train/validation/test sample splitting convention and the validation set is effectively the reference set or distribution in this case.

References

Kamulete, V. M. (2022). Test for non-negligible adverse shifts. In The 38th Conference on Uncertainty in Artificial Intelligence. PMLR.

Gandy, A. (2009). Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. Journal of the American Statistical Association, 104(488), 1504-1511.

Examples


library(dsos)
set.seed(12345)
os_train <- rnorm(n = 100)
os_test <- rnorm(n = 100)
null_test <- pt_from_os(os_train, os_test)
null_test

[Package dsos version 0.1.2 Index]