sensiHSIC {sensitivity} | R Documentation |
Sensitivity Indices based on the Hilbert-Schmidt Independence Criterion (HSIC)
Description
sensiHSIC
allows to conduct global sensitivity analysis (GSA) in many different contexts thanks to several sensitivity measures based on the Hilbert-Schmidt independence criterion (HSIC). The so-called HSIC sensitivity indices depend on the kernels which are affected to the input variables Xi
as well as on the kernel which is affected to the output object Y
. For each random entity, a reproducing kernel Hilbert space (RKHS) is associated to the chosen kernel and allows to represent probability distributions in an appropriate function space. The influence of Xi
on Y
is then measured through the distance between the joint probability distribution (true impact of Xi
on Y
in the numerical model) and the product of marginal distributions (no impact of Xi
on Y
) after embedding those distributions into a bivariate RKHS. Such a GSA approach has three main advantages:
The input variables
Xi
may be correlated.Any kind of mathematical object is supported (provided that a kernel function is available).
Accurate estimation is possible even in presence of very few data (no more than a hundred of input-output samples).
In sensiHSIC
, each input variable Xi
is expected to be scalar (either discrete or continous). On the contrary, a much wider collection of mathematical objects are supported for the output variable Y
. In particular, Y
may be:
A scalar output (either discrete or continous). If so, one single kernel family is selected among the kernel collection.
A low-dimensional vector output. If so, a kernel is selected for each output variable and the final output kernel is built by tensorization.
A high-dimensional vector output or a functional output. In this case, the output data must be seen as time series observations. Three different methods are proposed.
A preliminary dimension reduction may be performed. In order to achieve this, a principal component analysis (PCA) based on the empirical covariance matrix helps identify the first terms of the Kharunen-Loeve expansion. The final output kernel is then built in the reduced subspace where the functional data are projected.
The dynamic time warping (DTW) algorithm may be combined with a translation-invariant kernel. The resulting DTW-based output kernel is well-adapted to measure similarity between two given time series.
The global alignment kernel (GAK) may be directly applied on the functional data. Unlike the DTW kernel, it was specifically designed to deal with time series and to measure the impact of one input variable on the shape of the output curve.
Many variants of the original HSIC indices are now available in sensiHSIC
.
-
Normalized HSIC indices (R2-HSIC)
The original HSIC indices defined in Gretton et al. (2005) may be hard to interpret because they do not admit a universal upper bound. A first step to overcome this difficulty was enabled by Da Veiga (2015) with the definition of the R2-HSIC indices. The resulting sensitivity indices can no longer be greater than1
. -
Target HSIC indices (T-HSIC)
They were thought by Marrel and Chabridon (2021) to meet the needs of target sensitivity analysis (TSA). The idea is to measure the impact of each input variableXi
on a specific part of the output distribution (for example the upper tail). To achieve this, a weight functionw
is applied onY
before computing HSIC indices. -
Conditional HSIC indices (C-HSIC)
They were thought by Marrel and Chabridon (2021) to meet the needs of conditional sensitivity analysis (CSA). The idea is to measure the impact of each input variableXi
onY
when a specific event occurs. This conditioning event is detected on the output variableY
and its amplitude is taken into account thanks to a weight functionw
. -
HSIC-ANOVA indices
To improve the interpretability of HSIC indices, Da Veiga (2021) came up with an ANOVA-like decomposition that allows to establish a strict separation of main effects and interaction effects in the HSIC paradigm. The first-order and total-order HSIC-ANOVA indices are then defined just as the first-order and total-order Sobol' indices. However, this framework only holds if two major assumptions are verified: the input variablesXi
must be mutually independent and all input kernels must belong to the very restrained class of ANOVA kernels.
As most sensitivity measures, HSIC indices allow to rank the input variables Xi
according to the influence they have on the output variable Y
. They can also be used for a screening purpose, that is to distinguish between truly influential input variables and non-influential input variables. The user who is interested in this topic is invited to consult the documentation of the function testHSIC
.
Usage
sensiHSIC(model = NULL, X, target = NULL, cond = NULL,
kernelX = "rbf", paramX = NA,
kernelY = "rbf", paramY = NA,
estimator.type = "V-stat",
nboot = 0, conf = 0.95,
anova = list(obj = "no", is.uniform = TRUE),
sensi = NULL,
save.GM = list(KX = TRUE, KY = TRUE), ...)
## S3 method for class 'sensiHSIC'
tell(x, y = NULL, ...)
## S3 method for class 'sensiHSIC'
print(x, ...)
## S3 method for class 'sensiHSIC'
plot(x, ylim = c(0, 1), ...)
Arguments
model |
A function, or a statistical model with a |
X |
A
|
target |
A list of options to perform TSA. At least,
|
cond |
A list of options to perform CSA. At least,
|
kernelX |
A string or a vector of
For each input variable, available choices include In addition, let us assume that all input variables are uniformly distributed on
|
paramX |
A scalar value or a vector of
|
kernelY |
A string, a vector of To deal with a scalar output or a low-dimensional vector output, it is advised to select one kernel per output dimension and to tensorize all selected kernels. In this case,
Have a look at To deal with a high-dimensional vector output or a functional output, it is advised to reduce dimension or to use a dedicated kernel. In this case,
|
paramY |
A scalar value or a vector of values with output kernel parameters.
In other cases, Case 1:
Case 2:
Case 3:
Case 4:
|
estimator.type |
A string specifying the kind of estimator used for HSIC indices. Available choices include |
nboot |
Number of bootstrap replicates. |
conf |
A scalar value (between |
anova |
A list of parameters to achieve an ANOVA-like decomposition based on HSIC indices. At least,
|
sensi |
An object of class
|
save.GM |
A list of parameters indicating whether Gram matrices have to be saved. The list
|
x |
An object of class |
y |
A |
ylim |
A vector of two values specifying the |
... |
Any other arguments for |
Details
Let (Xi,Y)
be an input-output pair. The kernels assigned to Xi
and Y
are respectively denoted by Ki
and KY
.
For many global sensitivity measures, the influence of Xi
on Y
is measured in the light of the probabilistic dependence that exists within the input-output pair (Xi,Y)
. For this, a dissimilarity measure is applied between the joint probability distribution (true impact of Xi
and Y
in the numerical model) and the product of marginal distributions (no impact of Xi
on Y
). For instance, Borgonovo's sensitivity measure is built upon the total variation distance between those two probability distributions. See Borgonovo and Plischke (2016) for further details.
The HSIC-based sensitivity measure can be understood in this way since the index HSIC(Xi,Y)
results from the application of the Hilbert-Schmidt independence criterion (HSIC) on the pair (Xi,Y)
. This criterion is nothing but a special kind of dissimilarity measure between the joint probability distribution and the product of marginal distributions. This dissimilarity measure is called the maximum mean discrepancy (MMD) and its definition relies on the selected kernels Ki
and KY
. According to the theory of reproducing kernels, every kernel K
is related to a reproducing kernel Hilbert space (RKHS).Then, if K
is affected to a random variable Z
, any probability distribution describing the random behavior of Z
may be represented within the induced RKHS. In this setup, the dissimilarity between the joint probability distribution and the product of marginal distributions is then measured through the squared norm of their images into the bivariate RKHS. The user is referred to Gretton et al. (2006) for additional details on the mathematical construction of HSIC indices.
In practice, it may be difficult to understand how HSIC(Xi,Y)
measures dependence within (Xi,Y)
. An alternative definition relies on the concept of feature map. Let us recall that the value taken by a kernel function can always be seen as the scalar product of two feature functions lying in a feature space. Definition 1 in Gretton et al. (2005) introduces HSIC(Xi,Y)
as the Hilbert-Schmidt norm of a covariance-like operator between random features. For this reason, having access to the input and output feature maps may help identify the dependence patterns captured by HSIC(Xi,Y)
.
Kernels must be chosen very carefully. There exists a wide variety of kernels but only a few f them meet the needs of GSA. As HSIC(Xi,Y)
is supposed to be a dependence measure, it must be equal to 0
if and only if Xi
and Y
are independent. A sufficient condition to enable this equivalence is to take two characteristic kernels. The reader is referred to Fukumizu et al. (2004) for the mathematical definition of a characteristic kernel and to Sriperumbur et al. (2010) for an overview of the major related results. In particular:
The Gaussian kernel, the Laplace kernel, the Matern
3/2
kernel and the Matern5/2
kernel (all defined onR^2
) are characteristic.The transformed versions of the four abovementioned kernels (all defined on
[0,1]^2
) are characteristic.All Sobolev kernels (defined on
[0,1]^2
) are characteristic.The categorical kernel (defined on any discrete probability space) is characteristic.
Lemma 1 in Gretton et al. (2005) provides a third way of defining HSIC(Xi,Y)
. Since the associated formula is only based on three expectation terms, the corresponding estimation procedures are very simple and they do not ask for a large amount of input-output samples to be accurate. Two kinds of estimators may be used for HSIC(Xi,Y)
: the V-statistic estimator (which is non negative, biased and asymptotically unbiased) or the U-statistic estimator (unbiased). For both estimators, the computational complexity is O(n^2)
where n
is the sample size.
The user must always keep in mind the key steps leading to the estimation of HSIC(Xi,Y)
:
Input samples are simulated and the corresponding output samples are computed with the numerical model.
An input kernel
Ki
and an output kernelKY
are selected.-
In case of target sensitivity analysis: output samples are transformed by means of a weight function
w
. The input and output Gram matrices are constructed.
-
In case of conditional sensitivity analysis: conditioning weights are computed by means of a weight function
w
. The final estimate is computed. It depends on the selected estimator type (either a U-statistic or a V-statistic).
Kernel functions for random variables
All what follows is written for a scalar output Y
but the same is true for any scalar input Xi
.
Let D
denote the support of the output probability distribution. A kernel is a symmetric and positive definite function defined on the domain D
. Different kernel families are available in sensiHSIC
.
To deal with continuous probability distributions on
R
, one can use:The covariance kernel of the fractional Browian motion (
"dcov"
), the inverse multiquadratic kernel ("invmultiquad"
), the exponential kernel ("laplace"
), the dot-product kernel ("linear"
), the Matern3/2
kernel ("matern3"
), the Matern5/2
kernel ("matern5"
), the rationale quadratic kernel ("raquad"
) and the Gaussian kernel ("rbf"
).
To deal with continuous probability distributions on
[0,1]
, one can use:Any of the abovementioned kernel (restricted to
[0,1]
).The transformed exponential kernel (
"laplace_anova"
), the transformed Matern3/2
kernel ("matern3_anova"
), the transformed Matern5/2
kernel ("matern5_anova"
), the transformed Gaussian kernel ("rbf_anova"
), the Sobolev kernel with smoothness parameterr=1
("sobolev1"
) and the Sobolev kernel with smoothness parameterr=2
("sobolev2"
).
To deal with any discrete probability distribution, the categorical kernel (
"categ"
) must be used.
Two kinds of kernels must be distinguished:
-
Parameter-free kernels: the dot-product kernel (
"linear"
), the Sobolev kernel with smoothness parameterr=1
("sobolev1"
) and the Sobolev kernel with smoothness parameterr=2
("sobolev2"
). -
One-parameter kernels: the categorical kernel (
"categ"
), the covariance kernel of the fractional Brownian motion kernel ("dcov"
), the inverse multiquadratic kernel ("invmultiquad"
), the exponential kernel ("laplace"
), the transformed exponential kernel ("laplace_anova"
), the Matern3/2
kernel ("matern3"
), the transformed Matern3/2
kernel ("matern3_anova"
), the Matern5/2
kernel ("matern5"
), the transformed Matern5/2
kernel ("matern5_anova"
), the rationale quadratic kernel ("raquad"
), the Gaussian kernel ("rbf"
) and the transformed Gaussian kernel ("rbf_anova"
).
A major issue related to one-parameter kernels is how to set the parameter. It mainly depends on the role played by the parameter in the kernel expression.
For translation-invariant kernels and their ANOVA variants (that is all one-parameter kernels except
"categ"
and"dcov"
), the parameter may be interpreted as a correlation length (or a scale parameter). The rule of thumb is to compute the empirical standard deviation of the provided samples.For the covariance kernel of the fractional Brownian motion (
"dcov"
), the parameter is an exponent. Default value is1
.For the categorical kernel (
"categ"
), the parameter has no physical sense. It is just a kind of binary encoding.-
0
means the user wants to use the basic categorical kernel. -
1
means the user wants to use the weighted variant of the categorical kernel.
-
How to deal with a low-dimensional vector output?
Let us assume that the output vector Y
is composed of q
random variables Y1,...,Yq
.
A kernel Kj
is affected to each output variable Yj
and this leads to embed the j
-th output probability distribution in a RKHS denoted by Hj
. Then, the tensorization of H1,...,Hq
allows to build the final RKHS, that is the RKHS where the q
-variate output probability distribution describing the overall random behavior of Y
will be embedded. In this situation:
The final output kernel is the tensor product of all output kernels.
The final output Gram matrix is the Hadamard product of all output Gram matrices.
Once the final output Gram matrix is built, HSIC indices can be estimated, just as in the case of a scalar output.
How to deal with a high-dimensional vector output or a functional output?
In sensiHSIC
, three different methods are proposed in order to compute HSIC-based sensitivity indices in presence of functional outputs.
Dimension reduction
This approach was initially proposed by Da Veiga (2015). The key idea is to approximate the random functional output by the first terms of its Kharunen-Loeve expansion. This can be achived with a principal component analysis (PCA) that is carried out on the empirical covariance matrix.
The eigenvectors (or principal directions) allow to approximate the (deterministic) functional terms involved in the Kharunen-Loeve decomposition.
The eigenvalues allow to determine how many principal directions are sufficient in order to accurately represent the random function by means of its truncated Kharunen-Loeve expansion. The key idea behind dimension reduction is to keep as few principal directions as possible while preserving a prescribed level of explained variance.
The principal components are the coordinates of the functional output in the low-dimensional subspace resulting from PCA. There are computed for all output samples (time series observations). See Le Maitre and Knio (2010) for more detailed explanations.
The last step consists in constructing a kernel in the reduced subspace. One single kernel family is selected and affected to all principal directions. Moreover, all kernel parameters are computed automatically (with appropriate rules of thumb). Then, several strategies may be considered.
The initial method described in Da Veiga (2015) is based on a direct tensorization. One can also decide to sum kernels.
This approach was improved by El Amri and Marrel (2021). For each principal direction, a weight coefficient (equal the ratio between the eigenvalue and the sum of all selected eigenvalues) is computed.
The principal components are multiplied by their respective weight coefficients before summing kernels or tensorizing kernels.
The kernels can also be directly applied on the principal components before being linearly combined according to the weight coefficients.
In sensiHSIC, all these strategies correspond to the following specifications in kernelY
:
-
Direct tensorization:
kernelY=list(method="PCA", combi="prod", position="nowhere")
-
Direct sum:
kernelY=list(method="PCA", combi="sum", position="nowhere")
-
Rescaled tensorization:
kernelY=list(method="PCA", combi="prod", position="intern")
-
Rescaled sum:
kernelY=list(method="PCA", combi="sum", position="intern")
-
Weighted linear combination:
kernelY=list(method="PCA", combi="sum", position="extern")
Dynamic Time Warping (DTW)
The DTW algorithm developed by Sakoe and Chiba (1978) can be combined with a translation-invariant kernel in order to create a kernel function for times series. The resulting DTW-based output kernel is well-adapted to measure similarity between two given time series.
Suitable translation-invariant kernels include the inverse multiquadratic kernel ("invmultiquad"
), the exponential kernel ("laplace"
), the Matern 3/2
kernel ("matern3"
), the Matern 5/2
kernel ("matern5"
), the rationale quadratic kernel ("raquad"
) and the Gaussian kernel ("rbf"
).
The user is warned against the fact that DTW-based kernels are not positive definite functions. As a consequence, many theoretical properties do not hold anymore for HSIC indices.
For faster computations, sensiHSIC
is using the function dtw_dismat
from the package incDTW
.
Global Alignment Kernel (GAK)
Unlike DTW-based kernels, the GAK is a positive definite function. This time-series kernel was originally introduced in Cuturi et al. (2007) and further investigated in Cuturi (2011). It was used to compute HSIC indices on a simplified compartmental epidemiological model in Da Veiga (2021).
For faster computations, sensiHSIC
is using the function gak
from the package dtwclust
.
In sensiHSIC
, two GAK-related parameters may be tuned by the user with paramY
. They exactly correspond to the arguments sigma
and window.size
in the function gak
.
About normalized HSIC indices (R2-HSIC)
No doubt interpretability is the major drawback of HSIC indices. This shortcoming led Da Veiga (2021) to introduce a normalized version of HSIC(Xi,Y)
. The so-called R2-HSIC index is thus defined as the ratio between HSIC(Xi,Y)
and the square root of a normalizing constant equal to HSIC(Xi,Xi)*HSIC(Y,Y)
.
This normalized sensitivity measure is inspired from the distance correlation measure proposed by Szekely et al. (2007) and the resulting sensitivity indices are easier to interpret since they all fall in the interval [0,1]
.
About target HSIC indices (T-HSIC)
T-HSIC indices were designed by Marrel and Chabridon (2021) for TSA. They are only defined for a scalar output. Vector and functional outputs are not supported. The main idea of TSA is to measure the influence of each input variable Xi
on a modified version of Y
. To do so, a preliminary mathematical transform w
(called the weight function) is applied on Y
. The collection of HSIC indices is then estimated with respect to w(Y)
. Here are two examples of situations where TSA is particularly relevant:
How to measure the impact of
Xi
on the upper values taken byY
(for example the values above a given thresholdT
)?To answer this question, one may take
w(Y)=Y*1_{Y>T}
(zero-thresholding).
This can be specified insensiHSIC
withtarget=list(c=T, type="zeroTh", upper=TRUE)
.
How to measure the influence of
Xi
on the occurrence of the event{Y>T}
?To answer this question, one may take
w(Y)=1_{Y<T}
(indicator-thresholding).
This can be specified insensiHSIC
withtarget=list(c=T, type="indicTh", upper=FALSE)
.
In Marrel and Chabridon (2021), the two situations described above are referred to as "hard thresholding". To avoid using discontinuous weight functions, "smooth thresholding" may be used instead.
Spagnol et al. (2019): logistic transformation on both sides of the threshold
T
.Marrel and Chabridon (2021): exponential transformation above or below the threshold
T
.
These two smooth relaxation functions depend on a tuning parameter that helps control smoothness. For further details, the user is invited to consult the documentation of the function weightTSA
.
Remarks:
When
type="indicTh"
(indicator-thesholding),w(Y)
becomes a binary random variable. Accordingly, the output kernel selected inkernelY
must be the categorical kernel.In the spirit of R2-HSIC indices, T-HSIC indices can be normalized. The associated normalizing constant is equal to the square root of
HSIC(Xi,Xi)*HSIC(w(Y),w(Y))
.-
T-HSIC indices can be very naturally combined with the HSIC-ANOVA decomposition proposed by Da Veiga (2021). As a consequence, the arguments
target
andanova
insensiHSIC
can be enabled simultaneously. Compared with basic HSIC indices, there are three main differences: the input variables must be mutually independent, ANOVA kernels must be used for all input variables and the output of interest isw(Y)
. -
T-HSIC indices can be very naturally combined with the tests of independence proposed in
testHSIC
. In this context, the null hypothesis isH0
: "Xi
andw(Y)
are independent".
About conditional HSIC indices (C-HSIC)
C-HSIC indices were designed by Marrel and Chabridon (2021) for CSA. They are only defined for a scalar output. Vector and functional outputs are not supported. The idea is to measure the impact of each input variable Xi
on Y
when a specific event occurs. This conditioning event is defined on Y
thanks to a weight function w
. In order to compute the conditioning weights, w
is applied on the output samples and an empirical normalization is carried out (so that the overall sum of conditioning weights is equal to 1
). The conditioning weights are then combined with the simulated Gram matrices in order to estimate C-HSIC indices. All formulas can be found in Marrel and Chabridon (2021). Here is an exemple of a situation where CSA is particularly relevant:
Let us imagine that the event
{Y>T}
coincides with a system failure.
How to measure the influence ofXi
onY
when failure occurs?To answer this question, one may take
w(Y) = 1_{Y>T}
(indicator-thresholding).
This can be specified insensiHSIC
withcond=list(c=T, type="indicTh", upper=TRUE)
.
The three other weight functions proposed for TSA (namely "zeroTh"
, "logistic"
and "exp1side"
) can also be used but the role they play is less intuitive to understand. See Marrel and Chabridon (2021) for better explanations.
Remarks:
Unlike what is pointed out for TSA, when
type="thresholding"
, the output of interestY
remains a continuous random variable. The categorical kernel is thus inappropriate. A continuous kernel must be used instead.In the spirit of R2-HSIC indices, C-HSIC indices can be normalized. The associated normalizing constant is equal to the square root of
C-HSIC(Xi,Xi)*C-HSIC(Y,Y)
.Only V-statistics are supported to estimate C-HSIC indices. The reason is because the normalized version of C-HSIC indices cannot always be estimated with U-statistics. In particular, the estimates of
C-HSIC(Xi,Xi)*C-HSIC(Y,Y)
may be negative.-
C-HSIC indices cannot be combined with the HSIC-ANOVA decomposition proposed in Da Veiga (2021). In fact, the conditioning operation is feared to introduce statistical dependence among input variables, which forbids using HSIC-ANOVA indices. As a consequence, the arguments
cond
andanova
insensiHSIC
cannot be enabled simultaneously. -
C-HSIC indices can harly be combined with the tests of inpendence proposed in
testHSIC
. This is only possible iftype="indicTh"
. In this context, the null hypothesis isH0
: "Xi
andY
are independent if the event described incond
occurs".
About HSIC-ANOVA indices
In comparison with HSIC indices, R2-HSIC indices are easier to interpret. However, in terms of interpretability, Sobol' indices remain much more convenient since they can be understood as shares of the total output variance. Such an interpretation is made possible by the Hoeffding decomposition, also known as ANOVA decomposition.
It was proved in Da Veiga (2021) that an ANOVA-like decomposition can be achived for HSIC indices under certain conditions:
The input variables must be mutually independent (which was not required to compute all other kinds of HSIC indices).
-
ANOVA kernels must be assigned to all input variables.
This ANOVA setup allows to establish a strict separation between main effects and interaction effects in the HSIC sense. The first-order and total-order HSIC-ANOVA indices are then defined in the same fashion than first-order and total-order Sobol' indices. It is worth noting that the HSIC-ANOVA normalizing constant is equal to HSIC(X,Y)
and is thus different from the one used for R2-HSIC indices.
For a given probability measure P
, an ANOVA kernel K
is a kernel that can rewritten 1+k
where k
is an orthogonal kernel with respect to P
. Among the well-known parametric families of probability distributions and kernel functions, there are very few examples of orthogonal kernels. One example is given by Sobolev kernels when there are matched with the uniform probability measure on [0,1]. See Wahba et al. (1995) for further details on Sobolev kernels.
Moreover, several strategies to construct orthogonal kernels from non-orthogonal kernels are recalled in Da Veiga (2021). One of them consists in translating the feature map so that the resulting kernel becomes centered at the prescribed probability measure P
. This can be done analytically for some basic kernels (Gaussian, exponential, Matern 3/2
and Matern 5/2
) when P
is the uniform measure on [0,1]
. See Section 9 in Ginsbourger et al. (2016) for the corresponding formulas.
In sensiHSIC
, ANOVA kernels are only available for the uniform probability measure on [0,1]
. This includes the Sobolev kernel with parameter r=1
("sobolev1"
), the Sobolev kernel with parameter r=2
("sobolev2"
), the transformed Gaussian kernel ("rbf_anova"
), the transformed exponential kernel ("laplace_anova"
), the transformed Matern 3/2
kernel ("matern3_anova"
) and the transformed Matern 5/2
kernel ("matern5_anova"
).
As explained above, the HSIC-ANOVA indices can only be computed if all input variables are uniformly distributed on [0,1]
. Because of this limitation, a preliminary reformulation is needed if the GSA problem includes other kinds of input probability distributions. The probability integral transform (PIT) must be applied on each input variable Xi
. In addition, all quantile functions must be encapsulated in the numerical model, which may lead to reconsider the way model
is specified. In sensiHSIC
, if check=TRUE
is selected in anova
, it is checked that all input samples lie in [0,1]
. If this is not the case, a non-parametric rescaling (based on empirical distribution functions) is operated.
HSIC-ANOVA indices can be used for TSA. The only difference with GSA is the use of a weight function w
. On the contrary, CSA cannot be conducted with HSIC-ANOVA indices. Indeed, the conditioning operation is feared to introduce statistical independence among the input variables, which prevents using the HSIC-ANOVA approach.
Value
sensiHSIC
returns a list of class "sensiHSIC"
. It contains all the input arguments detailed before, except sensi
which is not kept. It must be noted that some of them might have been altered, corrected or completed.
kernelX |
A vector of |
paramX |
A vector of |
kernelY |
A vector of
|
paramY |
A vector of values with output kernel parameters. Case 1:
Case 2:
Case 3:
Case 4:
|
More importantly, the list of class "sensiHSIC"
contains all expected results (output samples, sensitivity measures and conditioning weights).
call |
The matched call. |
y |
A |
HSICXY |
The estimated HSIC indices. |
S |
The estimated R2-HSIC indices (also called normalized HSIC indices). |
weights |
Only if |
Depending on what is specified in anova
, the list of class "sensiHSIC"
may also contain the following objects:
FO |
The estimated first-order HSIC-ANOVA indices. |
TO |
The estimated total-order HSIC-ANOVA indices. |
TO.num |
The estimated numerators of total-order HSIC-ANOVA indices. |
denom |
The estimated common denominator of HSIC-ANOVA indices. |
Author(s)
Sebastien Da Veiga, Amandine Marrel, Anouar Meynaoui, Reda El Amri and Gabriel Sarazin.
References
Borgonovo, E. and Plischke, E. (2016), Sensitivity analysis: a review of recent advances, European Journal of Operational Research, 248(3), 869-887.
Cuturi, M., Vert, J. P., Birkenes, O. and Matsui, T. (2007), A kernel for time series based on global alignments, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 2, pp. II-413), IEEE.
Cuturi, M. (2011), Fast global alignment kernels, Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 929-936).
Da Veiga, S. (2015), Global sensitivity analysis with dependence measures, Journal of Statistical Computation and Simulation, 85(7), 1283-1305.
Da Veiga, S. (2021). Kernel-based ANOVA decomposition and Shapley effects: application to global sensitivity analysis, arXiv preprint arXiv:2101.05487.
El Amri, M. R. and Marrel, A. (2021), More powerful HSIC-based independence tests, extension to space-filling designs and functional data. https:/cea.hal.science/cea-03406956/
Fukumizu, K., Bach, F. R. and Jordan, M. I. (2004), Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, Journal of Machine Learning Research, 5(Jan), 73-99.
Ginsbourger, D., Roustant, O., Schuhmacher, D., Durrande, N. and Lenz, N. (2016), On ANOVA decompositions of kernels and Gaussian random field paths, Monte Carlo and Quasi-Monte Carlo Methods (pp. 315-330), Springer, Cham.
Gretton, A., Bousquet, O., Smola, A., and Scholkopf, B. (2005), Measuring statistical dependence with Hilbert-Schmidt norms, International Conference on Algorithmic Learning Theory (pp. 63-77), Springer, Berlin, Heidelberg.
Gretton, A., Borgwardt, K., Rasch, M., Scholkopf, B. and Smola, A. (2006), A kernel method for the two-sample-problem, Advances in Neural Information Processing Systems, 19.
Le Maitre, O. and Knio, O. M. (2010), Spectral methods for uncertainty quantification with applications to computational fluid dynamics, Springer Science & Business Media.
Marrel, A. and Chabridon, V. (2021), Statistical developments for target and conditional sensitivity analysis: application on safety studies for nuclear reactor, Reliability Engineering & System Safety, 214, 107711.
Sakoe, H. and Chiba, S. (1978), Dynamic programming algorithm optimization for spoken word recognition, IEEE International Conference on Acoustics, Speech and Signal, 26(1), 43-49.
Spagnol, A., Riche, R. L. and Veiga, S. D. (2019), Global sensitivity analysis for optimization with variable selection, SIAM/ASA Journal on Uncertainty Quantification, 7(2), 417-443.
Sriperumbudur, B., Fukumizu, K. and Lanckriet, G. (2010), On the relation between universality, characteristic kernels and RKHS embedding of measures, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 773-780). JMLR Workshop and Conference Proceedings.
Szekely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007), Measuring and testing dependence by correlation of distances, The Anals of Statistics, 35(6), 2769-2794.
Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995), Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy: the 1994 Neyman Memorial Lecture, The Annals of Statistics, 23(6), 1865-1895.
See Also
Examples
############################
### HSIC indices for GSA ###
############################
# Test case 1: the Friedman function
# --> 5 input variables
### GSA with a given model ###
n <- 800
p <- 5
X <- matrix(runif(n*p), n, p)
kernelX <- c("rbf", "rbf", "laplace", "laplace", "sobolev1")
paramX <- c(0.2, 0.3, 0.4, NA, NA)
# kernel for X1: Gaussian kernel with given parameter 0.2
# kernel for X2: Gaussian kernel with given parameter 0.3
# kernel for X3: exponential kernel with given parameter 0.4
# kernel for X4: exponential kernel with automatic computation of the parameter
# kernel for X5: Sobolev kernel (r=1) with no parameter
kernelY <- "raquad"
paramY <- NA
sensi <- sensiHSIC(model=friedman.fun, X,
kernelX=kernelX, paramX=paramX,
kernelY=kernelY, paramY=paramY)
print(sensi)
plot(sensi)
title("GSA for the Friedman function")
### GSA with given data ###
Y <- friedman.fun(X)
sensi <- sensiHSIC(model=NULL, X,
kernelX=kernelX, paramX=paramX,
kernelY=kernelY, paramY=paramY)
tell(sensi, y=Y)
print(sensi)
### GSA from a prior object of class "sensiHSIC" ###
new.sensi <- sensiHSIC(model=friedman.fun, X,
kernelX=kernelX, paramX=paramX,
kernelY=kernelY, paramY=paramY,
estimator.type="U-stat",
sensi=sensi,
save.GM=list(KX=FALSE, KY=FALSE))
print(new.sensi)
# U-statistics are computed without rebuilding all Gram matrices.
# Those Gram matrices are not saved a second time.
##################################
### HSIC-ANOVA indices for GSA ###
##################################
# Test case 2: the Matyas function with Gaussian input variables
# --> 3 input variables (including 1 dummy variable)
n <- 10^3
p <- 2
X <- matrix(rnorm(n*p), n, p)
# The Sobolev kernel (with r=1) is used to achieve the HSIC-ANOVA decomposition.
# Both first-order and total-order HSIC-ANOVA indices are expected.
### AUTOMATIC RESCALING ###
kernelX <- "sobolev1"
anova <- list(obj="both", is.uniform=FALSE)
sensi.A <- sensiHSIC(model=matyas.fun, X, kernelX=kernelX, anova=anova)
print(sensi.A)
plot(sensi.A)
title("GSA for the Matyas function")
### PROBLEM REFORMULATION ###
U <- matrix(runif(n*p), n, p)
new.matyas.fun <- function(U){ matyas.fun(qnorm(U)) }
kernelX <- "sobolev1"
anova <- list(obj="both", is.uniform=TRUE)
sensi.B <- sensiHSIC(model=new.matyas.fun, U, kernelX=kernelX, anova=anova)
print(sensi.B)
####################################
### T-HSIC indices for target SA ###
####################################
# Test case 3: the Sobol function
# --> 8 input variables
n <- 10^3
p <- 8
X <- matrix(runif(n*p), n, p)
kernelY <- "categ"
target <- list(c=0.4, type="indicTh")
sensi <- sensiHSIC(model=sobol.fun, X, kernelY=kernelY, target=target)
print(sensi)
plot(sensi)
title("TSA for the Sobol function")
#########################################
### C-HSIC indices for conditional SA ###
#########################################
# Test case 3: the Sobol function
# --> 8 input variables
n <- 10^3
p <- 8
X <- matrix(runif(n*p), n, p)
cond <- list(c=0.2, type="exp1side", upper=FALSE)
sensi <- sensiHSIC(model=sobol.fun, X, cond=cond)
print(sensi)
plot(sensi)
title("CSA for the Sobol function")
##########################################
### How to deal with discrete outputs? ###
##########################################
# Test case 4: classification of the Ishigami output
# --> 3 input variables
# --> 3 categories
classif <- function(X){
Ytemp <- ishigami.fun(X)
Y <- rep(NA, n)
Y[Ytemp<0] <- 0
Y[Ytemp>=0 & Ytemp<10] <- 1
Y[Ytemp>=10] <- 2
return(Y)
}
###
n <- 10^3
p <- 3
X <- matrix(runif(n*p, -pi, pi), n, p)
kernelY <- "categ"
paramY <- 0
sensi <- sensiHSIC(model=classif, X, kernelY=kernelY, paramY=paramY)
print(sensi)
plot(sensi)
title("GSA for the classified Ishigami function")
############################################
### How to deal with functional outputs? ###
############################################
# Test case 5: the arctangent temporal function
# --> 3 input variables (including 1 dummy variable)
n <- 500
p <- 3
X <- matrix(runif(n*p,-7,7), n, p)
### with a preliminary dimension reduction by PCA ###
kernelY <- list(method="PCA",
data.centering=TRUE, data.scaling=TRUE,
fam="rbf", expl.var=0.95, combi="sum", position="extern")
sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)
print(sensi)
plot(sensi)
title("PCA-based GSA for the arctangent temporal function")
### with a kernel based on dynamic time warping ###
kernelY <- list(method="DTW", fam="rbf")
sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)
print(sensi)
plot(sensi)
title("DTW-based GSA for the arctangent temporal function")
### with the global alignment kernel ###
kernelY <- list(method="GAK")
sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)
print(sensi)
plot(sensi)
title("GAK-based GSA for the arctangent temporal function")