Treee {LDATree} | R Documentation |
Classification trees with Linear Discriminant Analysis terminal nodes
Description
Usage
Treee(
formula,
data,
missingMethod = c("meanFlag", "newLevel"),
splitMethod = "LDscores",
pruneMethod = "none",
numberOfPruning = 10,
maxTreeLevel = 4,
minNodeSize = NULL,
verbose = FALSE
)
Arguments
formula |
an object of class formula, which has the form |
data |
a data frame that contains both predictors and the response. Missing values are allowed in predictors but not in the response. |
missingMethod |
Missing value solutions for numerical variables and
factor variables. |
splitMethod |
the splitting rule in LDATree growing process. For now,
|
pruneMethod |
the model selection method in the LDATree growing process,
which controls the size of the tree. By default, it's set to |
numberOfPruning |
controls the number of cross-validation in the pruning. It is 10 by default. |
maxTreeLevel |
controls the largest tree size possible for either a direct-stopping tree or a CV-pruned tree. Adding one extra level (depth) introduces an additional layer of nodes at the bottom of the current tree. e.g., when the maximum level is 1 (or 2), the maximum tree size is 3 (or 7). |
minNodeSize |
controls the minimum node size. Think carefully before changing this value. Setting a large number might result in early stopping and reduced accuracy. By default, it's set to one plus the number of classes in the response variable. |
verbose |
a logical. If TRUE, the function provides additional diagnostic messages or detailed output about its progress or internal workings. Default is FALSE, where the function runs silently without additional output. |
Details
Unlike other classification trees, LDATree integrates LDA throughout the entire tree-growing process. Here is a breakdown of its distinctive features:
The tree searches for the best binary split based on sample quantiles of the first linear discriminant score.
An LDA/GSVD model is fitted for each terminal node (For more details, refer to
ldaGSVD()
).Missing values can be imputed using the mean, median, or mode, with optional missing flags available.
By default, the tree employs a direct-stopping rule. However, cross-validation using the alpha-pruning from CART is also provided.
Value
An object of class Treee
containing the following components:
-
formula
: the formula passed to theTreee()
-
treee
: a list of all the tree nodes, and each node is an object of classTreeeNode
. -
missingMethod
: the missingMethod passed to theTreee()
An object of class
TreeeNode
containing the following components: -
currentIndex
: the node index of the current node -
currentLevel
: the level of the current node in the tree -
idxRow
,idxCol
: the row and column indices showing which portion of data is used in the current node -
currentLoss
: the training error (number of misclassified sample) of the current node -
accuracy
: the training accuracy of the current node -
proportions
: shows the observed frequency for each class -
parent
: the node index of its parent -
children
: the node indices of its direct children (not including its children's children) -
misReference
: a data frame, serves as the reference for missing value imputation -
splitCut
: the cut point of the split -
nodeModel
: one of'mode'
or'LDA'
. It shows the type of predictive model fitted in the current node -
nodePredict
: the fitted predictive model in the current node. It is an object of classldaGSVD
if LDA is fitted. IfnodeModel = 'mode'
, then it is a vector of length one, showing the plurality class. -
offsprings
: (available only ifpruneMethod = 'CV'
) showing all terminal descendant nodes of the current node -
offspringLoss
: (available only ifpruneMethod = 'CV'
) sum of thecurrentLoss
of theoffsprings
of the current node -
alpha
: (available only ifpruneMethod = 'CV'
) the alpha in alpha-pruning from CART
Examples
fit <- Treee(Species~., data = iris)
# Use cross-validation to prune the tree
fitCV <- Treee(Species~., data = iris, pruneMethod = "CV")
# prediction
predict(fit,iris)
# plot the overall tree
plot(fit)
# plot a certain node
plot(fit, iris, node = 1)