shapley {ShapleyOutlier}R Documentation

Decomposition of squared Mahalanobis distance using Shapley values.


The shapley function computes a pp-dimensional vector containing the decomposition of the squared Mahalanobis distance of x (with respect to mu and Sigma) into outlyingness contributions of the individual variables (Mayrhofer and Filzmoser 2022). The value of the jj-th coordinate of this vector represents the average marginal contribution of the jj-th variable to the squared Mahalanobis distance of the individual observation x.
If cells is provided, Shapley values of x are computed with respect to a local reference point, that is based on a cellwise prediction of each coordinate, using the information of the regular cells of x, see (Mayrhofer and Filzmoser 2022).
If x is a n×pn \times p matrix, a n×pn \times p matrix is returned, containing the decomposition for each row.


  mu = NULL,
  Sigma = NULL,
  inverted = FALSE,
  method = "cellMCD",
  check = TRUE,
  cells = NULL



Data vector with pp entries or data matrix with n×pn \times p entries containing only numeric entries.


Either NULL (default) or mean vector of x. If NULL, method is used for parameter estimation.


Either NULL (default) or covariance matrix p×pp \times p of x. If NULL, method is used for parameter estimation.


Logical. If TRUE, Sigma is supposed to contain the inverse of the covariance matrix.


Either "cellMCD" (default) or "MCD". Specifies the method used for parameter estimation if mu and/or Sigma are not provided.


Logical. If TRUE (default), inputs are checked before running the function and an error message is returned if one of the inputs is not as expected.


Either NULL (default) or a vector/matrix of the same dimension as x, indicating the outlying cells. The matrix must contain only zeros and ones, or TRUE/FALSE.



A pp-dimensional vector (or a n×pn \times p matrix) containing the Shapley values (outlyingness-scores) of x.


A pp-dimensional vector (or a n×pn \times p matrix) containing the alternative reference points based on the regular cells of the original observations.


The non-centrality parameters for the Chi-Squared distribution, given by mahlanobis(mu_tilde, mu, Sigma)


Mayrhofer M, Filzmoser P (2022). “Multivariate outlier explanations using Shapley values and Mahalanobis distances.” doi:10.48550/ARXIV.2210.10063.


## Without outlying cells as input in the 'cells' argument#'
# Single observation
p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
shapley(x, mu, Sigma)
phi <- shapley(x, mu, Sigma_inv, inverted = TRUE)

# Multiple observations
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X_clean <- X
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
call_shapley <- shapley(X, mu, Sigma)
plot(call_shapley, subset = 20)

## Giving outlying cells as input in the 'cells' argument
# Single observation
p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
call_shapley <- shapley(x, mu, Sigma_inv, inverted = TRUE,
method = "cellMCD", check = TRUE, cells = c(1,1,0,0,0))

# Multiple observations
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X_clean <- X
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
call_shapley <- shapley(X, mu, Sigma, cells = (X_clean - X)!=0)
plot(call_shapley, subset = 20)

[Package ShapleyOutlier version 0.1.1 Index]