R: Prediction of conditional Kendall's tau using nearest...

CKT.predict.kNN {CondCopulas}

R Documentation

Prediction of conditional Kendall's tau using nearest neighbors

Description

Let X_1 and X_2 be two random variables. The goal of this function is to estimate the conditional Kendall's tau (a dependence measure) between X_1 and X_2 given Z=z for a conditioning variable Z. Conditional Kendall's tau between X_1 and X_2 given Z=z is defined as:

P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)

- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),

where (X_{1,1}, X_{1,2}, Z_1) and (X_{2,1}, X_{2,2}, Z_2) are two independent and identically distributed copies of (X_1, X_2, Z). In other words, conditional Kendall's tau is the difference between the probabilities of observing concordant and discordant pairs from the conditional law of

(X_1, X_2) | Z=z.

This function estimates conditional Kendall's tau using a nearest neighbors. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.

Usage

CKT.predict.kNN(
  datasetPairs,
  designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE],
  newZ,
  number_nn,
  weightsVariables = 1,
  normLp = 2,
  constantA = 1,
  partition = NULL,
  verbose = 1,
  lengthVerbose = 100,
  methodSort = "partial.sort"
)

Arguments

`datasetPairs`	the matrix of pairs and corresponding values of the kernel as provided by `datasetPairs`.
`designMatrix`	the matrix of predictors. They must have the same number of variables as `newZ` and the same number of observations as `inputMatrix`, i.e. there should be one "multivariate observation" of the predictor for each pair.
`newZ`	the matrix of predictors for which we want to estimate the conditional Kendall's taus at these values.
`number_nn`	vector of numbers of nearest neighbors to use. If several number of neighbors are given (local) aggregation is performed using Lepski's method on the subset determined by the `partition`.
`weightsVariables`	optional argument to give different weights `w_j` to each variable.
`normLp`	the p in the weighted p-norm `\|\| x \|\|_p = \sum_j w_j * x_j^p` used to determine the distance in the computation of the nearest neighbors.
`constantA`	a tuning parameter that controls the adaptation. The higher, the smoother it is; while the smaller, the least smooth it is.
`partition`	used only if `length(number_nn) > 1`. It is the number of subsets to consider for the local choice of the number of nearest neighbors ; or a vector giving the id of each observations among the subsets. If `NULL`, only one set is used.
`verbose`	if TRUE, this print information each `lengthVerbose` iterations
`lengthVerbose`	number of iterations at each time for which progress is printed.
`methodSort`	is the sorting method used to find the nearest neighbors. Possible choices are `ecdf` (uses the ecdf to order the points to find the neighbors) and `partial.sort` uses a partial sorting algorithm. This parameter should not matter except for the computation time.

Value

a list with two components

estimatedCKT the estimated conditional Kendall's tau, a vector of the same size as the number of rows in newZ;
vect_k_chosen the locally selected number of nearest neighbors, a vector of the same size as the number of rows in newZ.

References

Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 5) doi:10.1016/j.csda.2019.01.013

Examples

# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
    par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])

newZ = seq(2,10,by = 0.1)
datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9)
estimatedCKT_knn <- CKT.predict.kNN(
  datasetPairs = datasetP,
  newZ = matrix(newZ,ncol = 1),
  number_nn = c(50,80, 100, 120,200),
  partition = 8)

# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col="black",
   type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_knn$estimatedCKT, col = "red")

[Package CondCopulas version 0.1.3 Index]