CKT.fit.randomForest {CondCopulas} | R Documentation |
Fit a Random Forest that can be used for the estimation of conditional Kendall's tau.
Description
Let and
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between
and
given
for a conditioning variable
.
Conditional Kendall's tau between
and
given
is defined as:
where and
are two independent and identically distributed copies of
.
In other words, conditional Kendall's tau is the difference
between the probabilities of observing concordant and discordant pairs
from the conditional law of
These functions estimate and predict conditional Kendall's tau using a random forest. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.
Usage
CKT.fit.randomForest(
datasetPairs,
designMatrix = data.frame(x = datasetPairs[, 2:(ncol(datasetPairs) - 3)]),
n,
nTree = 10,
mindev = 0.008,
mincut = 0,
nObs_per_Tree = ceiling(0.8 * n),
nVar_per_Tree = ceiling(0.8 * (ncol(datasetPairs) - 4)),
verbose = FALSE,
nMaxDepthAllowed = 10
)
CKT.predict.randomForest(fit, newZ)
Arguments
datasetPairs |
the matrix of pairs and corresponding values of the kernel
as provided by |
designMatrix |
the matrix of predictor to be used for the fitting of the tree |
n |
the original sample size of the dataset |
nTree |
number of trees of the Random Forest. |
mindev |
a factor giving the minimum deviation for a node to be splitted.
See |
mincut |
the minimum number of observations (of pairs) in a node
See |
nObs_per_Tree |
number of observations kept in each tree. |
nVar_per_Tree |
number of variables kept in each tree. |
verbose |
if |
nMaxDepthAllowed |
the maximum number of errors of type "the tree cannot be fitted" or "is too deep" before stopping the procedure. |
fit |
result of a call to |
newZ |
new matrix of observations, with the same number of variables.
and same names as the |
Value
a list with two components
-
list_tree
a list of sizenTree
composed of all the fitted trees. -
list_variables
a list of sizenTree
composed of the (predictor) variables for each tree.
CKT.predict.randomForest
returns
a vector of (predicted) conditional Kendall's taus of the same size
as the number of rows of the newZ.
References
Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 4) doi:10.1016/j.csda.2019.01.013
Examples
# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])
datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9)
est_RF = CKT.fit.randomForest(datasetPairs = datasetP, n = N,
mindev = 0.008)
newZ = seq(1,10,by = 0.1)
prediction = CKT.predict.randomForest(fit = est_RF,
newZ = data.frame(x=newZ))
# Comparison between true Kendall's tau (in red)
# and estimated Kendall's tau (in black)
plot(newZ, prediction, type = "l", ylim = c(-1,1))
lines(newZ, -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2), col="red")