R: Detecting cellwise outliers using Shapley values based on...

MOE {ShapleyOutlier}

R Documentation

Detecting cellwise outliers using Shapley values based on local outlyingness.

Description

The MOE function indicates outlying cells for a data vector with p entries or data matrix with n \times p entries containing only numeric entries x for a given center mu and covariance matrix Sigma using the Shapley value. It is a more sophisticated alternative to the SCD algorithm, which uses the information of the regular cells to derive an alternative reference point (Mayrhofer and Filzmoser 2022).

Usage

MOE(
  x,
  mu,
  Sigma,
  Sigma_inv = NULL,
  step_size = 0.1,
  min_deviation = 0,
  max_step = NULL,
  local = TRUE,
  max_iter = 1000,
  q = 0.99,
  check_outlyingness = FALSE,
  check = TRUE,
  cells = NULL,
  method = "cellMCD"
)

Arguments

`x`	Data vector with `p` entries or data matrix with `n \times p` entries containing only numeric entries.
`mu`	Either `NULL` (default) or mean vector of `x`. If NULL, `method` is used for parameter estimation.
`Sigma`	Either `NULL` (default) or covariance matrix `p \times p` of `x`. If NULL, `method` is used for parameter estimation.
`Sigma_inv`	Either `NULL` (default) or Sigma's inverse `p \times p` matrix. If `NULL`, the inverse of `Sigma` is computed using `solve(Sigma)`.
`step_size`	Numeric. Step size for the imputation of outlying cells, with `step_size` `\in [0,1]`. Defaults to `0.1`.
`min_deviation`	Numeric. Detection threshold, with `min_deviation` `\in [0,1]`. Defaults to `0.2`
`max_step`	Either `NULL` (default) or an integer. The maximum number of steps in each iteration. If `NULL`, `max_step` `= p`.
`local`	Logical. If TRUE (default), the non-central Chi-Squared distribution is used to determine the cutoff value based on `mu_tilde`.
`max_iter`	Integer. The maximum number of iterations.
`q`	Numeric. The quantile of the Chi-squared distribution for detection and imputation of outliers. Defaults to `0.99`.
`check_outlyingness`	Logical. If TRUE (default), the outlyingness is rechecked after applying `min_deviation`.
`check`	Logical. If `TRUE` (default), inputs are checked before running the function and an error message is returned if one of the inputs is not as expected.
`cells`	Either `NULL` (default) or a vector/matrix of the same dimension as `x`, indicating the outlying cells. The matrix must contain only zeros and ones, or `TRUE`/`FALSE`.
`method`	Either "cellMCD" (default) or "MCD". Specifies the method used for parameter estimation if `mu` and/or `Sigma` are not provided.

Value

A list of class shapley_algorithm (new_shapley_algorithm) containing the following:

`x`	A `p`-dimensional vector (or a `n \times p` matrix) containing the imputed data.
`phi`	A `p`-dimensional vector (or a `n \times p` matrix) containing the Shapley values (outlyingness-scores) of `x`; see `shapley`.
`mu_tilde`	A `p`-dimensional vector (or a `n \times p` matrix) containing the alternative reference points based on the regular cells of the original observations.
`x_original`	A `p`-dimensional vector (or a `n \times p` matrix) containing the original data.
`x_original`	The non-centrality parameters for the Chi-Squared distribution
`x_history`	A list with `n` elements, each containing the path of how the original data vector was modified.
`phi_history`	A list with `n` elements, each containing the Shapley values corresponding to `x_history`.
`mu_tilde_history`	A list with `n` elements, each containing the alternative reference points corresponding to `x_history`.
`S_history`	A list with `n` elements, each containing the indices of the outlying cells in each iteration.

References

Mayrhofer M, Filzmoser P (2022). “Multivariate outlier explanations using Shapley values and Mahalanobis distances.” doi:10.48550/ARXIV.2210.10063.

Examples

p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
MOE_x <- MOE(x = x, mu = mu, Sigma = Sigma)
plot(MOE_x)

library(MASS)
set.seed(1)
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
MOE_X <- MOE(X, mu, Sigma)
plot(MOE_X, subset = 20)

[Package ShapleyOutlier version 0.1.1 Index]