prune.tree {tree} | R Documentation |
Cost-complexity Pruning of Tree Object
Description
Determines a nested sequence of subtrees of the supplied tree by recursively “snipping” off the least important splits.
Usage
prune.tree(tree, k = NULL, best = NULL, newdata, nwts,
method = c("deviance", "misclass"), loss, eps = 1e-3)
prune.misclass(tree, k = NULL, best = NULL, newdata,
nwts, loss, eps = 1e-3)
Arguments
tree |
fitted model object of class |
k |
cost-complexity parameter defining either a specific subtree of |
best |
integer requesting the size (i.e. number of terminal nodes) of a
specific subtree in the cost-complexity sequence to be returned. This
is an alternative way to select a subtree than by supplying a scalar
cost-complexity parameter |
newdata |
data frame upon which the sequence of cost-complexity subtrees is evaluated. If missing, the data used to grow the tree are used. |
nwts |
weights for the |
method |
character string denoting the measure of node heterogeneity used to
guide cost-complexity pruning. For regression trees, only the
default, |
loss |
a matrix giving for each true class (row) the numeric loss of predicting the class (column). The classes should be in the order of the levels of the response. It is conventional for a loss matrix to have a zero diagonal. The default is 0–1 loss. |
eps |
a lower bound for the probabilities, used to compute deviances if
events of predicted probability zero occur in |
Details
Determines a nested sequence of subtrees of the supplied tree by
recursively "snipping" off the least important splits, based upon
the cost-complexity measure. prune.misclass
is an abbreviation for
prune.tree(method = "misclass")
for use with cv.tree
.
If k
is supplied, the optimal subtree for that value is returned.
The response as well as the predictors referred to in the right side
of the formula in tree
must be present by name in
newdata
. These data are dropped down each tree in the
cost-complexity sequence and deviances or losses calculated by
comparing the supplied response to the prediction. The function
cv.tree()
routinely uses the newdata
argument
in cross-validating the pruning procedure. A plot
method
exists for objects of this class. It displays the value of the
deviance, the number of misclassifications or the total loss for
each subtree in the cost-complexity sequence. An additional axis
displays the values of the cost-complexity parameter at each subtree.
Value
If k
is supplied and is a scalar, a tree
object is
returned that minimizes the cost-complexity measure for that k
.
If best
is supplied, a tree
object of size best
is returned. Otherwise, an object of class tree.sequence
is returned. The object contains the following components:
size |
number of terminal nodes in each tree in the cost-complexity pruning sequence. |
deviance |
total deviance of each tree in the cost-complexity pruning sequence. |
k |
the value of the cost-complexity pruning parameter of each tree in the sequence. |
Examples
data(fgl, package="MASS")
fgl.tr <- tree(type ~ ., fgl)
print(fgl.tr); plot(fgl.tr)
fgl.cv <- cv.tree(fgl.tr,, prune.tree)
for(i in 2:5) fgl.cv$dev <- fgl.cv$dev +
cv.tree(fgl.tr,, prune.tree)$dev
fgl.cv$dev <- fgl.cv$dev/5
plot(fgl.cv)