FFTrees {FFTrees} | R Documentation |
Main function to create and apply fast-and-frugal trees (FFTs)
Description
FFTrees
is the workhorse function of the FFTrees package for creating fast-and-frugal trees (FFTs).
FFTs are decision algorithms for solving binary classification tasks, i.e., they predict the values of a binary criterion variable based on 1 or multiple predictor variables (cues).
Using FFTrees
on data
usually generates a range of FFTs and corresponding summary statistics (as an FFTrees
object)
that can then be printed, plotted, and examined further.
The criterion and predictor variables are specified in formula
notation.
Based on the settings of data
and data.test
, FFTs are trained on a (required) training dataset
(given the set of current goal
values) and evaluated on (or predict) an (optional) test dataset.
If an existing FFTrees
object object
or tree.definitions
are provided as inputs,
no new FFTs are created.
When both arguments are provided, tree.definitions
take priority over the FFTs in an existing object
.
Specifically,
If
tree.definitions
are provided, these are assigned to the FFTs ofx
.If no
tree.definitions
are provided, but an existingFFTrees
objectobject
is provided, the trees fromobject
are assigned to the FFTs ofx
.
Create and evaluate fast-and-frugal trees (FFTs).
Usage
FFTrees(
formula = NULL,
data = NULL,
data.test = NULL,
algorithm = "ifan",
train.p = 1,
goal = NULL,
goal.chase = NULL,
goal.threshold = NULL,
max.levels = NULL,
numthresh.method = "o",
numthresh.n = 10,
repeat.cues = TRUE,
stopping.rule = "exemplars",
stopping.par = 0.1,
sens.w = 0.5,
cost.outcomes = NULL,
cost.cues = NULL,
main = NULL,
decision.labels = c("False", "True"),
my.goal = NULL,
my.goal.fun = NULL,
my.tree = NULL,
object = NULL,
tree.definitions = NULL,
do.comp = TRUE,
do.cart = TRUE,
do.lr = TRUE,
do.rf = TRUE,
do.svm = TRUE,
quiet = list(ini = TRUE, fin = FALSE, mis = FALSE, set = TRUE),
comp = NULL,
force = NULL,
rank.method = NULL,
rounding = NULL,
store.data = NULL,
verbose = NULL
)
Arguments
formula |
A formula. A |
data |
A data frame. A dataset used for training (fitting) FFTs and alternative algorithms.
|
data.test |
A data frame. An optional dataset used for model testing (prediction) with the same structure as data. |
algorithm |
A character string. The algorithm used to create FFTs. Can be |
train.p |
numeric. What percentage of the data to use for training when |
goal |
A character string indicating the statistic to maximize when selecting trees:
|
goal.chase |
A character string indicating the statistic to maximize when constructing trees:
|
goal.threshold |
A character string indicating the criterion to maximize when optimizing cue thresholds:
|
max.levels |
integer. The maximum number of nodes (or levels) considered for an FFT.
As all combinations of possible exit structures are considered, larger values of |
numthresh.method |
How should thresholds for numeric cues be determined (as character)?
|
numthresh.n |
The number of numeric thresholds to try (as integer).
Default: |
repeat.cues |
May cues occur multiple times within a tree (as logical)?
Default: |
stopping.rule |
A character string indicating the method to stop growing trees. Available options are:
All stopping methods use |
stopping.par |
numeric. A numeric parameter indicating the criterion value for the current |
sens.w |
A numeric value from |
cost.outcomes |
A list of length 4 specifying the cost value for one of the 4 possible classification outcomes.
The list elements must be named |
cost.cues |
A list containing the cost of each cue (in some common unit).
Each list element must have a name corresponding to a cue (i.e., a variable in |
main |
string. An optional label for the dataset. Passed on to other functions, like |
decision.labels |
A vector of strings of length 2 for the text labels for negative and positive decision/prediction outcomes
(i.e., left vs. right, noise vs. signal, 0 vs. 1, respectively, as character).
E.g.; |
my.goal |
The name of an optimization measure defined by |
my.goal.fun |
The definition of an outcome measure to optimize, defined as a function
of the frequency counts of the 4 basic classification outcomes |
my.tree |
A verbal description of an FFT, i.e., an "FFT in words" (as character string).
For example, |
object |
An optional existing |
tree.definitions |
An optional |
do.comp , do.lr , do.cart , do.svm , do.rf |
Should alternative algorithms be used for comparison (as logical)?
All options are set to
Specifying |
quiet |
A list of 4 logical arguments: Should detailed progress reports be suppressed?
Setting list elements to |
comp , force , rank.method , rounding , store.data , verbose |
Deprecated arguments (unused or replaced, to be retired in future releases). |
Value
An FFTrees
object with the following elements:
- criterion_name
The name of the binary criterion variable (as character).
- cue_names
The names of all potential predictor variables (cues) in the data (as character).
- formula
The
formula
specified when creating the FFTs.- trees
A list of FFTs created, with further details contained in
n
,best
,definitions
,inwords
,stats
,level_stats
, anddecisions
.- data
The original training and test data (if available).
- params
A list of defined control parameters (e.g.;
algorithm
,goal
,sens.w
, as well as various thresholds, stopping rule, and cost parameters).- competition
Models and classification statistics for competitive classification algorithms: Logistic regression (
lr
), classification and regression trees (cart
), random forests (rf
), and support vector machines (svm
).- cues
A list of cue information, with further details contained in
thresholds
andstats
.
See Also
print.FFTrees
for printing FFTs;
plot.FFTrees
for plotting FFTs;
summary.FFTrees
for summarizing FFTs;
inwords
for obtaining a verbal description of FFTs;
showcues
for plotting cue accuracies.
Examples
# 1. Create fast-and-frugal trees (FFTs) for heart disease:
heart.fft <- FFTrees(formula = diagnosis ~ .,
data = heart.train,
data.test = heart.test,
main = "Heart Disease",
decision.labels = c("Healthy", "Diseased")
)
# 2. Print a summary of the result:
heart.fft # same as:
# print(heart.fft, data = "train", tree = "best.train")
# 3. Plot an FFT applied to training data:
plot(heart.fft) # same as:
# plot(heart.fft, what = "all", data = "train", tree = "best.train")
# 4. Apply FFT to (new) testing data:
plot(heart.fft, data = "test") # predict for Tree 1
plot(heart.fft, data = "test", tree = 2) # predict for Tree 2
# 5. Predict classes and probabilities for new data:
predict(heart.fft, newdata = heartdisease)
predict(heart.fft, newdata = heartdisease, type = "prob")
# 6. Create a custom tree (from verbal description) with my.tree:
custom.fft <- FFTrees(
formula = diagnosis ~ .,
data = heartdisease,
my.tree = "If age < 50, predict False.
If sex = 1, predict True.
If chol > 300, predict True, otherwise predict False.",
main = "My custom FFT")
# Plot the (pretty bad) custom tree:
plot(custom.fft)