CKT.predict.kNN {CondCopulas} | R Documentation |
Prediction of conditional Kendall's tau using nearest neighbors
Description
Let X_1
and X_2
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between X_1
and X_2
given Z=z
for a conditioning variable Z
.
Conditional Kendall's tau between X_1
and X_2
given Z=z
is defined as:
P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)
- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),
where (X_{1,1}, X_{1,2}, Z_1)
and (X_{2,1}, X_{2,2}, Z_2)
are two independent and identically distributed copies of (X_1, X_2, Z)
.
In other words, conditional Kendall's tau is the difference
between the probabilities of observing concordant and discordant pairs
from the conditional law of
(X_1, X_2) | Z=z.
This function estimates conditional Kendall's tau using a nearest neighbors. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.
Usage
CKT.predict.kNN(
datasetPairs,
designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE],
newZ,
number_nn,
weightsVariables = 1,
normLp = 2,
constantA = 1,
partition = NULL,
verbose = 1,
lengthVerbose = 100,
methodSort = "partial.sort"
)
Arguments
datasetPairs |
the matrix of pairs and corresponding values of the kernel
as provided by |
designMatrix |
the matrix of predictors.
They must have the same number of variables as |
newZ |
the matrix of predictors for which we want to estimate the conditional Kendall's taus at these values. |
number_nn |
vector of numbers of nearest neighbors to use.
If several number of neighbors are given (local) aggregation is performed
using Lepski's method on the subset determined by the |
weightsVariables |
optional argument to give
different weights |
normLp |
the p in the weighted p-norm |
constantA |
a tuning parameter that controls the adaptation. The higher, the smoother it is; while the smaller, the least smooth it is. |
partition |
used only if |
verbose |
if TRUE, this print information each |
lengthVerbose |
number of iterations at each time for which progress is printed. |
methodSort |
is the sorting method used to find the nearest neighbors.
Possible choices are
|
Value
a list with two components
-
estimatedCKT
the estimated conditional Kendall's tau, a vector of the same size as the number of rows innewZ
; -
vect_k_chosen
the locally selected number of nearest neighbors, a vector of the same size as the number of rows innewZ
.
References
Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 5) doi:10.1016/j.csda.2019.01.013
See Also
See also other estimators of conditional Kendall's tau:
CKT.fit.tree
, CKT.fit.randomForest
,
CKT.fit.nNets
, CKT.fit.randomForest
,
CKT.fit.GLM
, CKT.kernel
,
CKT.kendallReg.fit
,
and the more general wrapper CKT.estimate
.
Examples
# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])
newZ = seq(2,10,by = 0.1)
datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9)
estimatedCKT_knn <- CKT.predict.kNN(
datasetPairs = datasetP,
newZ = matrix(newZ,ncol = 1),
number_nn = c(50,80, 100, 120,200),
partition = 8)
# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col="black",
type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_knn$estimatedCKT, col = "red")