CKT.fit.randomForest {CondCopulas}R Documentation

Fit a Random Forest that can be used for the estimation of conditional Kendall's tau.

Description

Let X_1 and X_2 be two random variables. The goal of this function is to estimate the conditional Kendall's tau (a dependence measure) between X_1 and X_2 given Z=z for a conditioning variable Z. Conditional Kendall's tau between X_1 and X_2 given Z=z is defined as:

P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)

- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),

where (X_{1,1}, X_{1,2}, Z_1) and (X_{2,1}, X_{2,2}, Z_2) are two independent and identically distributed copies of (X_1, X_2, Z). In other words, conditional Kendall's tau is the difference between the probabilities of observing concordant and discordant pairs from the conditional law of

(X_1, X_2) | Z=z.

These functions estimate and predict conditional Kendall's tau using a random forest. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.

Usage

CKT.fit.randomForest(
  datasetPairs,
  designMatrix = data.frame(x = datasetPairs[, 2:(ncol(datasetPairs) - 3)]),
  n,
  nTree = 10,
  mindev = 0.008,
  mincut = 0,
  nObs_per_Tree = ceiling(0.8 * n),
  nVar_per_Tree = ceiling(0.8 * (ncol(datasetPairs) - 4)),
  verbose = FALSE,
  nMaxDepthAllowed = 10
)

CKT.predict.randomForest(fit, newZ)

Arguments

datasetPairs

the matrix of pairs and corresponding values of the kernel as provided by datasetPairs.

designMatrix

the matrix of predictor to be used for the fitting of the tree

n

the original sample size of the dataset

nTree

number of trees of the Random Forest.

mindev

a factor giving the minimum deviation for a node to be splitted. See tree::tree.control() for more details.

mincut

the minimum number of observations (of pairs) in a node See tree::tree.control() for more details.

nObs_per_Tree

number of observations kept in each tree.

nVar_per_Tree

number of variables kept in each tree.

verbose

if TRUE, a message is printed after fitting each tree.

nMaxDepthAllowed

the maximum number of errors of type "the tree cannot be fitted" or "is too deep" before stopping the procedure.

fit

result of a call to CKT.fit.randomForest.

newZ

new matrix of observations, with the same number of variables. and same names as the designMatrix that was used to fit the Random Forest.

Value

a list with two components

CKT.predict.randomForest returns a vector of (predicted) conditional Kendall's taus of the same size as the number of rows of the newZ.

References

Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 4) doi:10.1016/j.csda.2019.01.013

Examples

# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
    par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])

datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9)
est_RF = CKT.fit.randomForest(datasetPairs = datasetP, n = N,
  mindev = 0.008)

newZ = seq(1,10,by = 0.1)
prediction = CKT.predict.randomForest(fit = est_RF,
   newZ = data.frame(x=newZ))
# Comparison between true Kendall's tau (in red)
# and estimated Kendall's tau (in black)
plot(newZ, prediction, type = "l", ylim = c(-1,1))
lines(newZ, -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2), col="red")


[Package CondCopulas version 0.1.3 Index]