shapleySubsetMc {sensitivity} | R Documentation |
Estimation of Shapley effects from data using nearest neighbors method
Description
shapleySubsetMc
implements the estimation of
the Shapley effects from data using some nearest neighbors method
to generate according to the conditional distributions of the inputs.
It can be used with categorical inputs.
Usage
shapleySubsetMc(X,Y, Ntot=NULL, Ni=3, cat=NULL, weight=NULL, discrete=NULL, noise=FALSE)
## S3 method for class 'shapleySubsetMc'
plot(x, ylim = c(0, 1), ...)
Arguments
X |
a matrix or a dataframe of the input sample |
Y |
a vector of the output sample |
Ntot |
an integer of the approximate cost wanted |
Ni |
the number of nearest neighbours taken for each point |
cat |
a vector giving the indices of the input categorical variables |
weight |
a vector with the same length of |
discrete |
a vector giving the indices of the input variable that are real, and not categorical, but that can take several times the same values |
noise |
logical. If FALSE (the default), the variable Y is a function of X |
x |
a list of class |
ylim |
y-coordinate plotting limits |
... |
any other arguments for plotting |
Details
If weight = NULL
, all the categorical variables will have the same weight 1.
If Ntot = NULL
, the nearest neighbours will be compute for all the n (2^p-2)
points,
where n is the length of the sample. The estimation can be very long with this parameter.
Value
shapleySubsetMc
returns a list of class "shapleySubsetMc"
,
containing:
shapley |
the Shapley effects estimates. |
cost |
the real total cost of these estimates: the total number of points for which the nearest neighbours were computed. |
names |
the labels of the input variables. |
Author(s)
Baptiste Broto
References
B. Broto, F. Bachoc, M. Depecker, 2020, Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution, SIAM/ASA Journal of Uncertainty Quantification, 8:693-716.
See Also
shapleyPermEx, shapleyPermRand, shapleyLinearGaussian, sobolrank, shapleysobol_knn
Examples
# First example: the linear Gaussian framework
# we generate a covariance matrice Sigma
p <- 4 #dimension
A <- matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma <- t(A)%*%A # it means t(A)%*%A
C <- chol(Sigma)
n <- 500 #sample size (put n=2000 for more consistency)
Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z%*%C # X is a gaussian vector with zero mean and covariance Sigma
Y=rowSums(X)
Shap=shapleySubsetMc(X=X,Y=Y,Ntot=5000)
plot(Shap)
#Second example: The Sobol model with heterogeneous inputs
p=8 #dimension
A=matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma=t(A)%*%A
C=chol(Sigma)
n=500 #sample size (put n=5000 for more consistency)
Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z
#we create discrete and categorical variables
X[,1]=round(X[,1]/2)
X[,2]=X[,2]>2
X[,4]=-2*round(X[,4])+4
X[(X[,6]>0 &X[,6]<1),6]=1
cat=c(1,2) # we choose to take X1 and X2 as categorical variables (with the discrete distance)
discrete=c(4,6) # we indicate that X4 and X6 can take several times the same value
Y=sobol.fun(X)
Ntot <- 2000 # put Ntot=20000 for more consistency
Shap=shapleySubsetMc(X=X,Y=Y, cat=cat, discrete=discrete, Ntot=Ntot, Ni=10)
plot(Shap)