CKT.predict.kNN {CondCopulas} | R Documentation |
Prediction of conditional Kendall's tau using nearest neighbors
Description
Let and
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between
and
given
for a conditioning variable
.
Conditional Kendall's tau between
and
given
is defined as:
where and
are two independent and identically distributed copies of
.
In other words, conditional Kendall's tau is the difference
between the probabilities of observing concordant and discordant pairs
from the conditional law of
This function estimates conditional Kendall's tau using a nearest neighbors. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.
Usage
CKT.predict.kNN(
datasetPairs,
designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE],
newZ,
number_nn,
weightsVariables = 1,
normLp = 2,
constantA = 1,
partition = NULL,
verbose = 1,
lengthVerbose = 100,
methodSort = "partial.sort"
)
Arguments
datasetPairs |
the matrix of pairs and corresponding values of the kernel
as provided by |
designMatrix |
the matrix of predictors.
They must have the same number of variables as |
newZ |
the matrix of predictors for which we want to estimate the conditional Kendall's taus at these values. |
number_nn |
vector of numbers of nearest neighbors to use.
If several number of neighbors are given (local) aggregation is performed
using Lepski's method on the subset determined by the |
weightsVariables |
optional argument to give
different weights |
normLp |
the p in the weighted p-norm |
constantA |
a tuning parameter that controls the adaptation. The higher, the smoother it is; while the smaller, the least smooth it is. |
partition |
used only if |
verbose |
if TRUE, this print information each |
lengthVerbose |
number of iterations at each time for which progress is printed. |
methodSort |
is the sorting method used to find the nearest neighbors.
Possible choices are
|
Value
a list with two components
-
estimatedCKT
the estimated conditional Kendall's tau, a vector of the same size as the number of rows innewZ
; -
vect_k_chosen
the locally selected number of nearest neighbors, a vector of the same size as the number of rows innewZ
.
References
Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 5) doi:10.1016/j.csda.2019.01.013
See Also
See also other estimators of conditional Kendall's tau:
CKT.fit.tree
, CKT.fit.randomForest
,
CKT.fit.nNets
, CKT.fit.randomForest
,
CKT.fit.GLM
, CKT.kernel
,
CKT.kendallReg.fit
,
and the more general wrapper CKT.estimate
.
Examples
# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])
newZ = seq(2,10,by = 0.1)
datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9)
estimatedCKT_knn <- CKT.predict.kNN(
datasetPairs = datasetP,
newZ = matrix(newZ,ncol = 1),
number_nn = c(50,80, 100, 120,200),
partition = 8)
# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col="black",
type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_knn$estimatedCKT, col = "red")