shapley {ShapleyOutlier} | R Documentation |
Decomposition of squared Mahalanobis distance using Shapley values.
Description
The shapley
function computes a p
-dimensional vector containing the decomposition of the
squared Mahalanobis distance of x
(with respect to mu
and Sigma
)
into outlyingness contributions of the individual variables (Mayrhofer and Filzmoser 2022).
The value of the j
-th coordinate of this vector represents the
average marginal contribution of the j
-th variable to the squared Mahalanobis distance of
the individual observation x
.
If cells
is provided, Shapley values of x
are computed with respect to a local reference point,
that is based on a cellwise prediction of each coordinate, using the information of the regular cells of x
, see (Mayrhofer and Filzmoser 2022).
If x
is a n \times p
matrix, a n \times p
matrix is returned, containing the decomposition for each row.
Usage
shapley(
x,
mu = NULL,
Sigma = NULL,
inverted = FALSE,
method = "cellMCD",
check = TRUE,
cells = NULL
)
Arguments
x |
Data vector with |
mu |
Either |
Sigma |
Either |
inverted |
Logical. If |
method |
Either "cellMCD" (default) or "MCD". Specifies the method used for parameter estimation if |
check |
Logical. If |
cells |
Either |
Value
phi |
A |
mu_tilde |
A |
non_centrality |
The non-centrality parameters for the Chi-Squared distribution, given by |
References
Mayrhofer M, Filzmoser P (2022). “Multivariate outlier explanations using Shapley values and Mahalanobis distances.” doi:10.48550/ARXIV.2210.10063.
Examples
## Without outlying cells as input in the 'cells' argument#'
# Single observation
p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
shapley(x, mu, Sigma)
phi <- shapley(x, mu, Sigma_inv, inverted = TRUE)
plot(phi)
# Multiple observations
library(MASS)
set.seed(1)
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X_clean <- X
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
call_shapley <- shapley(X, mu, Sigma)
plot(call_shapley, subset = 20)
## Giving outlying cells as input in the 'cells' argument
# Single observation
p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
call_shapley <- shapley(x, mu, Sigma_inv, inverted = TRUE,
method = "cellMCD", check = TRUE, cells = c(1,1,0,0,0))
plot(call_shapley)
# Multiple observations
library(MASS)
set.seed(1)
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X_clean <- X
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
call_shapley <- shapley(X, mu, Sigma, cells = (X_clean - X)!=0)
plot(call_shapley, subset = 20)