CKT.predict.kNN {CondCopulas}R Documentation

Prediction of conditional Kendall's tau using nearest neighbors

Description

Let X_1 and X_2 be two random variables. The goal of this function is to estimate the conditional Kendall's tau (a dependence measure) between X_1 and X_2 given Z=z for a conditioning variable Z. Conditional Kendall's tau between X_1 and X_2 given Z=z is defined as:

P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)

- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),

where (X_{1,1}, X_{1,2}, Z_1) and (X_{2,1}, X_{2,2}, Z_2) are two independent and identically distributed copies of (X_1, X_2, Z). In other words, conditional Kendall's tau is the difference between the probabilities of observing concordant and discordant pairs from the conditional law of

(X_1, X_2) | Z=z.

This function estimates conditional Kendall's tau using a nearest neighbors. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.

Usage

CKT.predict.kNN(
  datasetPairs,
  designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE],
  newZ,
  number_nn,
  weightsVariables = 1,
  normLp = 2,
  constantA = 1,
  partition = NULL,
  verbose = 1,
  lengthVerbose = 100,
  methodSort = "partial.sort"
)

Arguments

datasetPairs

the matrix of pairs and corresponding values of the kernel as provided by datasetPairs.

designMatrix

the matrix of predictors. They must have the same number of variables as newZ and the same number of observations as inputMatrix, i.e. there should be one "multivariate observation" of the predictor for each pair.

newZ

the matrix of predictors for which we want to estimate the conditional Kendall's taus at these values.

number_nn

vector of numbers of nearest neighbors to use. If several number of neighbors are given (local) aggregation is performed using Lepski's method on the subset determined by the partition.

weightsVariables

optional argument to give different weights w_j to each variable.

normLp

the p in the weighted p-norm || x ||_p = \sum_j w_j * x_j^p used to determine the distance in the computation of the nearest neighbors.

constantA

a tuning parameter that controls the adaptation. The higher, the smoother it is; while the smaller, the least smooth it is.

partition

used only if length(number_nn) > 1. It is the number of subsets to consider for the local choice of the number of nearest neighbors ; or a vector giving the id of each observations among the subsets. If NULL, only one set is used.

verbose

if TRUE, this print information each lengthVerbose iterations

lengthVerbose

number of iterations at each time for which progress is printed.

methodSort

is the sorting method used to find the nearest neighbors. Possible choices are ecdf (uses the ecdf to order the points to find the neighbors) and partial.sort uses a partial sorting algorithm. This parameter should not matter except for the computation time.

Value

a list with two components

References

Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 5) doi:10.1016/j.csda.2019.01.013

See Also

See also other estimators of conditional Kendall's tau: CKT.fit.tree, CKT.fit.randomForest, CKT.fit.nNets, CKT.fit.randomForest, CKT.fit.GLM, CKT.kernel, CKT.kendallReg.fit, and the more general wrapper CKT.estimate.

Examples

# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
    par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])

newZ = seq(2,10,by = 0.1)
datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9)
estimatedCKT_knn <- CKT.predict.kNN(
  datasetPairs = datasetP,
  newZ = matrix(newZ,ncol = 1),
  number_nn = c(50,80, 100, 120,200),
  partition = 8)

# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col="black",
   type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_knn$estimatedCKT, col = "red")


[Package CondCopulas version 0.1.3 Index]