R: Error rate comparisons for tree-based classification

compareTreecalcs {DAAG}

R Documentation

Error rate comparisons for tree-based classification

Description

Compare error rates, between different functions and different selection rules, for an approximately equal random division of the data into a training and test set.

Usage

compareTreecalcs(x = yesno ~ ., data = DAAG::spam7, cp = 0.00025, fun = c("rpart",
"randomForest"))

Arguments

`x`	model formula
`data`	an data frame in which to interpret the variables named in the formula
`cp`	setting for the cost complexity parameter `cp`, used by rpart()
`fun`	one or both of "rpart" and "randomForest"

Details

Data are randomly divided into two subsets, I and II. The function(s) are used in the standard way for calculations on subset I, and error rates returined that come from the calculations carried out by the function(s). Predictions are made for subset II, allowing the calculation of a completely independent set of error rates.

Value

If rpart is specified in fun, the following:

`rpSEcvI`	the estimated cross-validation error rate when `rpart()` is run on the training data (I), and the one-standard error rule is used
`rpcvI`	the estimated cross-validation error rate when `rpart()` is run on subset I, and the model used that gives the minimum cross-validated error rate
`rpSEtest`	the error rate when the model that leads to `rpSEcvI` is used to make predictions for subset II
`rptest`	the error rate when the model that leads to `rpcvI` is used to make predictions for subset II
`nSErule`	number of splits required by the one standard error rule
`nREmin`	number of splits to give the minimum error

If rpart is specified in fun, the following:

`rfcvI`	the out-of-bag (OOB) error rate when `randomForest()` is run on subset I
`rftest`	the error rate when the model that leads to `rfcvI` is used to make predictions for subset II

Author(s)

John Maindonald

[Package DAAG version 1.25.6 Index]