R: Estimation of Shapley effects from data using nearest...

shapleySubsetMc {sensitivity}

R Documentation

Estimation of Shapley effects from data using nearest neighbors method

Description

shapleySubsetMc implements the estimation of the Shapley effects from data using some nearest neighbors method to generate according to the conditional distributions of the inputs. It can be used with categorical inputs.

Usage

shapleySubsetMc(X,Y, Ntot=NULL, Ni=3, cat=NULL, weight=NULL, discrete=NULL, noise=FALSE)
## S3 method for class 'shapleySubsetMc'
plot(x, ylim = c(0, 1), ...)

Arguments

`X`	a matrix or a dataframe of the input sample
`Y`	a vector of the output sample
`Ntot`	an integer of the approximate cost wanted
`Ni`	the number of nearest neighbours taken for each point
`cat`	a vector giving the indices of the input categorical variables
`weight`	a vector with the same length of `cat` giving the weight of each categorical variable in the product distance
`discrete`	a vector giving the indices of the input variable that are real, and not categorical, but that can take several times the same values
`noise`	logical. If FALSE (the default), the variable Y is a function of X
`x`	a list of class `"shapleySubsetMc"` storing the state of the sensitivity study (Shapley effects, cost, names of inputs)
`ylim`	y-coordinate plotting limits
`...`	any other arguments for plotting

Details

If weight = NULL, all the categorical variables will have the same weight 1.

If Ntot = NULL, the nearest neighbours will be compute for all the n (2^p-2) points, where n is the length of the sample. The estimation can be very long with this parameter.

Value

shapleySubsetMc returns a list of class "shapleySubsetMc", containing:

`shapley`	the Shapley effects estimates.
`cost`	the real total cost of these estimates: the total number of points for which the nearest neighbours were computed.
`names`	the labels of the input variables.

Author(s)

Baptiste Broto

References

B. Broto, F. Bachoc, M. Depecker, 2020, Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution, SIAM/ASA Journal of Uncertainty Quantification, 8:693-716.

Examples



# First example: the linear Gaussian framework

# we generate a covariance matrice Sigma
p <- 4 #dimension
A <- matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma <- t(A)%*%A # it means t(A)%*%A
C <- chol(Sigma)
n <- 500 #sample size (put n=2000 for more consistency)

Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z%*%C # X is a gaussian vector with zero mean and covariance Sigma
Y=rowSums(X) 
Shap=shapleySubsetMc(X=X,Y=Y,Ntot=5000)
plot(Shap)


#Second example: The Sobol model with heterogeneous inputs

p=8 #dimension
A=matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma=t(A)%*%A
C=chol(Sigma)
n=500 #sample size (put n=5000 for more consistency)

Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z

#we create discrete and categorical variables
X[,1]=round(X[,1]/2) 
X[,2]=X[,2]>2
X[,4]=-2*round(X[,4])+4
X[(X[,6]>0 &X[,6]<1),6]=1

cat=c(1,2)  # we choose to take X1 and X2 as categorical variables (with the discrete distance)
discrete=c(4,6) # we indicate that X4 and X6 can take several times the same value

Y=sobol.fun(X)
Ntot <- 2000 # put Ntot=20000 for more consistency
Shap=shapleySubsetMc(X=X,Y=Y, cat=cat, discrete=discrete, Ntot=Ntot, Ni=10)

plot(Shap)

[Package sensitivity version 1.30.0 Index]