shapley {ShapleyOutlier}R Documentation

Decomposition of squared Mahalanobis distance using Shapley values.

Description

The shapley function computes a p-dimensional vector containing the decomposition of the squared Mahalanobis distance of x (with respect to mu and Sigma) into outlyingness contributions of the individual variables (Mayrhofer and Filzmoser 2022). The value of the j-th coordinate of this vector represents the average marginal contribution of the j-th variable to the squared Mahalanobis distance of the individual observation x.
If cells is provided, Shapley values of x are computed with respect to a local reference point, that is based on a cellwise prediction of each coordinate, using the information of the regular cells of x, see (Mayrhofer and Filzmoser 2022).
If x is a n \times p matrix, a n \times p matrix is returned, containing the decomposition for each row.

Usage

shapley(
  x,
  mu = NULL,
  Sigma = NULL,
  inverted = FALSE,
  method = "cellMCD",
  check = TRUE,
  cells = NULL
)

Arguments

x

Data vector with p entries or data matrix with n \times p entries containing only numeric entries.

mu

Either NULL (default) or mean vector of x. If NULL, method is used for parameter estimation.

Sigma

Either NULL (default) or covariance matrix p \times p of x. If NULL, method is used for parameter estimation.

inverted

Logical. If TRUE, Sigma is supposed to contain the inverse of the covariance matrix.

method

Either "cellMCD" (default) or "MCD". Specifies the method used for parameter estimation if mu and/or Sigma are not provided.

check

Logical. If TRUE (default), inputs are checked before running the function and an error message is returned if one of the inputs is not as expected.

cells

Either NULL (default) or a vector/matrix of the same dimension as x, indicating the outlying cells. The matrix must contain only zeros and ones, or TRUE/FALSE.

Value

phi

A p-dimensional vector (or a n \times p matrix) containing the Shapley values (outlyingness-scores) of x.

mu_tilde

A p-dimensional vector (or a n \times p matrix) containing the alternative reference points based on the regular cells of the original observations.

non_centrality

The non-centrality parameters for the Chi-Squared distribution, given by mahlanobis(mu_tilde, mu, Sigma)

References

Mayrhofer M, Filzmoser P (2022). “Multivariate outlier explanations using Shapley values and Mahalanobis distances.” doi:10.48550/ARXIV.2210.10063.

Examples

## Without outlying cells as input in the 'cells' argument#'
# Single observation
p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
shapley(x, mu, Sigma)
phi <- shapley(x, mu, Sigma_inv, inverted = TRUE)
plot(phi)

# Multiple observations
library(MASS)
set.seed(1)
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X_clean <- X
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
call_shapley <- shapley(X, mu, Sigma)
plot(call_shapley, subset = 20)


## Giving outlying cells as input in the 'cells' argument
# Single observation
p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
call_shapley <- shapley(x, mu, Sigma_inv, inverted = TRUE,
method = "cellMCD", check = TRUE, cells = c(1,1,0,0,0))
plot(call_shapley)

# Multiple observations
library(MASS)
set.seed(1)
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X_clean <- X
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
call_shapley <- shapley(X, mu, Sigma, cells = (X_clean - X)!=0)
plot(call_shapley, subset = 20)

[Package ShapleyOutlier version 0.1.1 Index]