SCD {ShapleyOutlier}R Documentation

Detecting cellwise outliers using Shapley values.

Description

The SCD function indicates outlying cells for a data vector with p entries or data matrix with n \times p entries containing only numeric entries x for a given center mu and covariance matrix Sigma using the Shapley value (Mayrhofer and Filzmoser 2022).

Usage

SCD(
  x,
  mu,
  Sigma,
  Sigma_inv = NULL,
  step_size = 0.1,
  min_deviation = 0,
  max_step = NULL,
  max_iter = 1000,
  q = 0.99,
  method = "cellMCD",
  check = TRUE,
  cells = NULL
)

Arguments

x

Data vector with p entries or data matrix with n \times p entries containing only numeric entries.

mu

Either NULL (default) or mean vector of x. If NULL, method is used for parameter estimation.

Sigma

Either NULL (default) or covariance matrix p \times p of x. If NULL, method is used for parameter estimation.

Sigma_inv

Either NULL (default) or Sigma's inverse p \times p matrix. If NULL, the inverse of Sigma is computed using solve(Sigma).

step_size

Numeric. Step size for the imputation of outlying cells, with step_size \in [0,1]. Defaults to 0.1.

min_deviation

Numeric. Detection threshold, with min_deviation \in [0,1]. Defaults to 0.2

max_step

Either NULL (default) or an integer. The maximum number of steps in each iteration. If NULL, max_step = p.

max_iter

Integer. The maximum number of iterations.

q

Numeric. The quantile of the Chi-squared distribution for detection and imputation of outliers. Defaults to 0.99.

method

Either "cellMCD" (default) or "MCD". Specifies the method used for parameter estimation if mu and/or Sigma are not provided.

check

Logical. If TRUE (default), inputs are checked before running the function and an error message is returned if one of the inputs is not as expected.

cells

Either NULL (default) or a vector/matrix of the same dimension as x, indicating the outlying cells. The matrix must contain only zeros and ones, or TRUE/FALSE.

Value

A list of class shapley_algorithm (new_shapley_algorithm) containing the following:

x

A p-dimensional vector (or a n \times p matrix) containing the imputed data.

phi

A p-dimensional vector (or a n \times p matrix) containing the Shapley values (outlyingness-scores) of x; see shapley.

x_original

A p-dimensional vector (or a n \times p matrix) containing the original data.

x_history

The path of how the original data vector was modified.

phi_history

The Shapley values corresponding to x_history.

S_history

The indices of the outlying cells in each iteration.

References

Mayrhofer M, Filzmoser P (2022). “Multivariate outlier explanations using Shapley values and Mahalanobis distances.” doi:10.48550/ARXIV.2210.10063.

Examples

p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
SCD_x <- SCD(x = x, mu = mu, Sigma = Sigma)
plot(SCD_x)

library(MASS)
set.seed(1)
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
SCD_X <- SCD(X, mu, Sigma)
plot(SCD_X, subset = 20)

[Package ShapleyOutlier version 0.1.1 Index]