R: Fit a Random Forest that can be used for the estimation of...

CKT.fit.randomForest {CondCopulas}

R Documentation

Fit a Random Forest that can be used for the estimation of conditional Kendall's tau.

Description

Let X_1 and X_2 be two random variables. The goal of this function is to estimate the conditional Kendall's tau (a dependence measure) between X_1 and X_2 given Z=z for a conditioning variable Z. Conditional Kendall's tau between X_1 and X_2 given Z=z is defined as:

P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)

- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),

where (X_{1,1}, X_{1,2}, Z_1) and (X_{2,1}, X_{2,2}, Z_2) are two independent and identically distributed copies of (X_1, X_2, Z). In other words, conditional Kendall's tau is the difference between the probabilities of observing concordant and discordant pairs from the conditional law of

(X_1, X_2) | Z=z.

These functions estimate and predict conditional Kendall's tau using a random forest. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.

Usage

CKT.fit.randomForest(
  datasetPairs,
  designMatrix = data.frame(x = datasetPairs[, 2:(ncol(datasetPairs) - 3)]),
  n,
  nTree = 10,
  mindev = 0.008,
  mincut = 0,
  nObs_per_Tree = ceiling(0.8 * n),
  nVar_per_Tree = ceiling(0.8 * (ncol(datasetPairs) - 4)),
  verbose = FALSE,
  nMaxDepthAllowed = 10
)

CKT.predict.randomForest(fit, newZ)

Arguments

`datasetPairs`	the matrix of pairs and corresponding values of the kernel as provided by `datasetPairs`.
`designMatrix`	the matrix of predictor to be used for the fitting of the tree
`n`	the original sample size of the dataset
`nTree`	number of trees of the Random Forest.
`mindev`	a factor giving the minimum deviation for a node to be splitted. See `tree::tree.control()` for more details.
`mincut`	the minimum number of observations (of pairs) in a node See `tree::tree.control()` for more details.
`nObs_per_Tree`	number of observations kept in each tree.
`nVar_per_Tree`	number of variables kept in each tree.
`verbose`	if `TRUE`, a message is printed after fitting each tree.
`nMaxDepthAllowed`	the maximum number of errors of type "the tree cannot be fitted" or "is too deep" before stopping the procedure.
`fit`	result of a call to `CKT.fit.randomForest`.
`newZ`	new matrix of observations, with the same number of variables. and same names as the `designMatrix` that was used to fit the Random Forest.

Value

a list with two components

list_tree a list of size nTree composed of all the fitted trees.
list_variables a list of size nTree composed of the (predictor) variables for each tree.

CKT.predict.randomForest returns a vector of (predicted) conditional Kendall's taus of the same size as the number of rows of the newZ.

References

Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 4) doi:10.1016/j.csda.2019.01.013

Examples

# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
    par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])

datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9)
est_RF = CKT.fit.randomForest(datasetPairs = datasetP, n = N,
  mindev = 0.008)

newZ = seq(1,10,by = 0.1)
prediction = CKT.predict.randomForest(fit = est_RF,
   newZ = data.frame(x=newZ))
# Comparison between true Kendall's tau (in red)
# and estimated Kendall's tau (in black)
plot(newZ, prediction, type = "l", ylim = c(-1,1))
lines(newZ, -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2), col="red")

[Package CondCopulas version 0.1.3 Index]