R: Estimation of conditional Kendall's tau between two variables...

CKT.estimate {CondCopulas}

R Documentation

Estimation of conditional Kendall's tau between two variables X1 and X2 given Z = z

Description

Let X_1 and X_2 be two random variables. The goal of this function is to estimate the conditional Kendall's tau (a dependence measure) between X_1 and X_2 given Z=z for a conditioning variable Z. Conditional Kendall's tau between X_1 and X_2 given Z=z is defined as:

P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)

- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),

where (X_{1,1}, X_{1,2}, Z_1) and (X_{2,1}, X_{2,2}, Z_2) are two independent and identically distributed copies of (X_1, X_2, Z). In other words, conditional Kendall's tau is the difference between the probabilities of observing concordant and discordant pairs from the conditional law of

(X_1, X_2) | Z=z.

This function can use different estimators for conditional Kendall's tau, see the description of the parameter methodEstimation for a complete list of possibilities.

Usage

CKT.estimate(
  observedX1, observedX2, observedZ,
  newZ = observedZ, methodEstimation, h,
  listPhi = if(methodEstimation == "kendallReg")
               {list( function(x){return(x)}   ,
                      function(x){return(x^2)} ,
                      function(x){return(x^3)} )
               } else {list(identity)} ,
  ...)

Arguments

`observedX1`	a vector of `n` observations of the first variable
`observedX2`	a vector of `n` observations of the second variable
`observedZ`	a vector of `n` observations of the conditioning variable, or a matrix with `n` rows of observations of the conditioning vector (if `Z` is multivariate).
`newZ`	the new values for the conditioning variable `Z` at which the conditional Kendall's tau should be estimated. If `observedZ` is a vector, then `newZ` must be a vector as well. If `observedZ` is a matrix, then `newZ` must be a matrix as well, with the same number of columns ( = the dimension of `Z`).
`methodEstimation`	method for estimating the conditional Kendall's tau. Possible estimation methods are: `"kernel"`: kernel smoothing, as described in (Derumigny, & Fermanian (2019a)) `"kendallReg"`: regression-type model, as described in (Derumigny, & Fermanian (2020)) `"tree"`, `"randomForest"`, `"logit"`, and `"neuralNetwork"`: use the relationship between conditional Kendall's tau and classification problems to use the respective classification algorithms for the estimation of conditional Kendall's tau, as described in (Derumigny, & Fermanian (2019b))
`h`	the bandwidth
`listPhi`	the list of transformations to be applied to the conditioning variable `Z` (in case of regression-type models).
`...`	other parameters passed to the estimating functions `CKT.fit.tree`, `CKT.fit.randomForest`, `CKT.fit.GLM`, `CKT.fit.nNets`, `CKT.predict.kNN`, `CKT.kernel` and `CKT.kendallReg.fit`.

Value

the vector of estimated conditional Kendall's tau at each of the observations of newZ.

References

Derumigny, A., & Fermanian, J. D. (2019a). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. doi:10.1016/j.csda.2019.01.013

Derumigny, A., & Fermanian, J. D. (2019b). On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior. Dependence Modeling, 7(1), 292-321. doi:10.1515/demo-2019-0016

Derumigny, A., & Fermanian, J. D. (2020). On Kendall’s regression. Journal of Multivariate Analysis, 178, 104610. doi:10.1016/j.jmva.2020.104610

Examples

# We simulate from a conditional copula
set.seed(1)
N = 300
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
    par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])

newZ = seq(2,10,by = 0.1)
h = 0.1
estimatedCKT_tree <- CKT.estimate(
  observedX1 = X1, observedX2 = X2, observedZ = Z,
  newZ = newZ,
  methodEstimation = "tree", h = h)

estimatedCKT_rf <- CKT.estimate(
  observedX1 = X1, observedX2 = X2, observedZ = Z,
  newZ = newZ,
  methodEstimation = "randomForest", h = h)

estimatedCKT_GLM <- CKT.estimate(
  observedX1 = X1, observedX2 = X2, observedZ = Z,
  newZ = newZ,
  methodEstimation = "logit", h = h,
  listPhi = list(function(x){return(x)}, function(x){return(x^2)},
                 function(x){return(x^3)}) )

estimatedCKT_kNN <- CKT.estimate(
  observedX1 = X1, observedX2 = X2, observedZ = Z,
  newZ = newZ,
  methodEstimation = "nearestNeighbors", h = h,
  number_nn = c(50,80, 100, 120,200),
  partition = 4
  )

estimatedCKT_nNet <- CKT.estimate(
  observedX1 = X1, observedX2 = X2, observedZ = Z,
  newZ = newZ,
  methodEstimation = "neuralNetwork", h = h,
  )

estimatedCKT_kernel <- CKT.estimate(
  observedX1 = X1, observedX2 = X2, observedZ = Z,
  newZ = newZ,
  methodEstimation = "kernel", h = h,
  )

estimatedCKT_kendallReg <- CKT.estimate(
   observedX1 = X1, observedX2 = X2, observedZ = Z,
   newZ = newZ,
   methodEstimation = "kendallReg", h = h)

# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in other colors)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col="black",
   type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_tree, col = "red")
lines(newZ, estimatedCKT_rf, col = "blue")
lines(newZ, estimatedCKT_GLM, col = "green")
lines(newZ, estimatedCKT_kNN, col = "purple")
lines(newZ, estimatedCKT_nNet, col = "coral")
lines(newZ, estimatedCKT_kernel, col = "skyblue")
lines(newZ, estimatedCKT_kendallReg, col = "darkgreen")

[Package CondCopulas version 0.1.3 Index]