CKT.kernel {CondCopulas} | R Documentation |
Estimation of conditional Kendall's tau using kernel smoothing
Description
Let X_1
and X_2
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between X_1
and X_2
given Z=z
for a conditioning variable Z
.
Conditional Kendall's tau between X_1
and X_2
given Z=z
is defined as:
P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)
- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),
where (X_{1,1}, X_{1,2}, Z_1)
and (X_{2,1}, X_{2,2}, Z_2)
are two independent and identically distributed copies of (X_1, X_2, Z)
.
For this, a kernel-based estimator is used, as described in
(Derumigny, & Fermanian (2019)).
Usage
CKT.kernel(
observedX1,
observedX2,
observedZ,
newZ,
h,
kernel.name = "Epa",
methodCV = "Kfolds",
Kfolds = 5,
nPairs = 10 * length(observedX1),
typeEstCKT = "wdm",
progressBar = TRUE
)
Arguments
observedX1 |
a vector of n observations of the first variable |
observedX2 |
a vector of n observations of the second variable |
observedZ |
a vector of n observations of the conditioning variable, or a matrix with n rows of observations of the conditioning vector |
newZ |
the new data of observations of Z at which the conditional Kendall's tau should be estimated. |
h |
the bandwidth used for kernel smoothing.
If this is a vector, then cross-validation is used following the method
given by argument |
kernel.name |
name of the kernel used for smoothing.
Possible choices are |
methodCV |
method used for the cross-validation.
Possible choices are |
Kfolds |
number of subsamples used,
if |
nPairs |
number of pairs used in the cross-validation criteria,
if |
typeEstCKT |
type of estimation of the conditional Kendall's tau. Possible choices are
|
progressBar |
if |
Details
Choice of the bandwidth h
.
The choice of the bandwidth must be done carefully.
In the univariate case, the default kernel (Epanechnikov kernel) has a support
on [-1,1]
, so for a bandwidth h
, estimation of conditional Kendall's
tau at Z=z
will only use points for which Z_i \in [z \pm h]
.
As usual in nonparametric estimation, h
should not be too small
(to avoid having a too large variance) and should not be large
(to avoid having a too large bias).
We recommend that for each z
for which the conditional Kendall's tau
\tau_{X_1, X_2 | Z=z}
is estimated, the set
\{i: Z_i \in [z \pm h] \}
should contain at least 20 points and not more than 30% of the points of
the whole dataset.
Note that for a consistent estimation, as the sample size n
tends
to the infinity, h
should tend to 0
while the size of the set
\{i: Z_i \in [z \pm h]\}
should also tend to the infinity.
Indeed the conditioning points should be closer and closer to the point of interest z
(small h
) and more and more numerous (h
tending to 0 slowly enough).
In the multivariate case, similar recommendations can be made. Because of the curse of dimensionality, a larger sample will be necessary to reach the same level of precision as in the univariate case.
Value
a list with two components
-
estimatedCKT
the vector of sizeNROW(newZ)
containing the values of the estimated conditional Kendall's tau. -
finalh
the bandwidthh
that was finally used for kernel smoothing (either the one specified by the user or the one chosen by cross-validation if multiple bandwidths were given.)
References
Derumigny, A., & Fermanian, J. D. (2019). On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior. Dependence Modeling, 7(1), 292-321. doi:10.1515/demo-2019-0016
See Also
CKT.estimate
for other estimators
of conditional Kendall's tau.
CKTmatrix.kernel
for a generalization of this function
when the conditioned vector is of dimension d
instead of dimension 2
here.
See CKT.hCV.l1out
for manual selection of the bandwidth h
by leave-one-out or K-folds cross-validation.
Examples
# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])
newZ = seq(2,10,by = 0.1)
estimatedCKT_kernel <- CKT.kernel(
observedX1 = X1, observedX2 = X2, observedZ = Z,
newZ = newZ, h = 0.1, kernel.name = "Epa")$estimatedCKT
# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col="black",
type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_kernel, col = "red")