tree.control {logicDT} | R Documentation |
Control parameters for fitting decision trees
Description
Configure the fitting process of individual decision trees.
Usage
tree.control(
nodesize = 10,
split_criterion = "gini",
alpha = 0.05,
cp = 0.001,
smoothing = "none",
mtry = "none",
covariable = "final_4pl"
)
Arguments
nodesize |
Minimum number of samples contained in a terminal node. This parameter ensures that enough samples are available for performing predictions which includes fitting regression models such as 4pL models. |
split_criterion |
Splitting criterion for deciding
when and how to split. The default is |
alpha |
Significance threshold for the likelihood ratio
tests when using |
cp |
Complexity parameter. This parameter determines
by which amount the impurity has to be reduced to further
split a node. Here, the total tree impurity is considered.
See details for a specific formula. Only used if
|
smoothing |
Shall the leaf predictions for risk
estimation be smoothed? |
mtry |
Shall the tree fitting process be randomized
as in random forests? Currently, only |
covariable |
How shall optional quantitative covariables
be handled? |
Details
For the Gini or MSE splitting criterion,
if any considered split s
leads to
P(t) \cdot \Delta I(s,t) > \texttt{cp}
for a node t
, the empirical node probability
P(t)
and the impurity reduction \Delta I(s,t)
,
then the node is further splitted. If not, the node is
declared as a leaf.
For continuous outcomes, cp
will be scaled by the
empirical variance of y
to ensure the right scaling,
i.e., cp <- cp * var(y)
. Since the impurity measure
for continuous outcomes is the mean squared error, this can
be interpreted as controlling the minimum reduction of the
normalized mean squared error (NRMSE to the power of two).
If one chooses the 4pL or linear splitting criterion, likelihood
ratio tests testing the alternative of better fitting individual
models are employed. The corresponding test statistic
asymptotically follows a \chi^2
distribution where
the degrees of freedom are given by the difference in the
number of model parameters, i.e., leading to
2 \cdot 4 - 4 = 4
degrees of freedom in the case of 4pL
models and to 2 \cdot 2 - 2 = 2
degrees of freedom in
the case of linear models.
For binary outcomes, choosing to fit linear models for evaluating the splits or for modeling the leaves actually leads to fitting LDA (linear discriminant analysis) models.
Value
An object of class tree.control
which is a list
of all necessary tree parameters.