idaTree {ibmdbR} | R Documentation |
Decision and Regression tree
Description
This function generates a tree model based on the contents of an IDA data frame (ida.data.frame
).
Usage
idaTree( form, data, id, minsplit=50, maxdepth=10, qmeasure=NULL,
minimprove=0.01, eval=NULL, valtable=NULL, modelname=NULL)
## S3 method for class 'idaTree'
plot(x,...)
## S3 method for class 'idaTree'
predict(object, newdata, id, ...)
Arguments
form |
A |
data |
An IDA data frame that contains the input data for the function. The input IDA data frame must include a column that contains a unique ID for each row. |
id |
The name of the column that contains a unique ID for each row of the input data. |
minsplit |
The minimum number of rows a node must contain to be split further. |
maxdepth |
The maximum depth (that is, the number of hierarchical levels) of the generated tree. |
qmeasure |
The measure that is to be used to prune the tree.
For a decision tree, allowed values are |
minimprove |
The minimum improvement. A node is not split further unless the split improves the class impurity by at least the amount specified for this parameter. |
eval |
The criterion that is to be used to calculate each split.
For a decision tree, allowed values are |
valtable |
When the output tree is to be pruned using external data, use this parameter to specify the fully-qualified name of the table that contains that data. Otherwise, specify NULL. |
modelname |
The name under which the model is stored in the database.
This is the name that is specified when using functions such as |
object |
An object of the class |
x |
An object of the class |
newdata |
A IDA data frame that contains the data to which to apply the model. |
... |
additional arguments to be passed to plot or predict. |
Details
The idaTree function uses a top-down, iterative procedure to generate a decision-tree or regression-tree model, depending on the type of the target variable. The resulting model comprises a network of nodes and connectors, and each subnode is the endpoint of a binary split.
A node is not split further when any of the following are true:
The node has a uniform class (and therefore cannot be split further).
Additional splits do not improve the class impurity by at least the amount specified by
minimprove
.The number of rows contained by the node is less than the value specified by
minsplit
.The tree depth reaches the value specified by
maxdepth
.
If variable that is used to determine a split does not have a value, the corresponding row remains in the node that is being split.
The output of the print function for a idaTree object is a textual description of the corresponding model.
The output of the plot function for a idaTree object is a graphical representation of the corresponding model.
Models are stored persistently in the database under the name modelname
. Model names cannot have more than 64 characters and
cannot contain white spaces. They need to be quoted like table names, otherwise they will be treated upper case by default. Only one
model with a given name is allowed in the database at a time. If a model with modelname
already exists, you need to drop it with idaDropModel
first before you can create another one with the same name. The model name can be used to retrieve the model later (idaRetrieveModel
).
The predict.idaTree
method applies the model to the data in a table and returns a IDA data frame that contains
a list of tuples, each of which comprises one row ID and one prediction.
Value
The idaTree function returns an object of classes idaTree
and rpart
.
See Also
idaRetrieveModel
, idaDropModel
, idaListModels
Examples
## Not run:
#Create a pointer to the table IRIS
idf <- ida.data.frame('IRIS')
#Create a tree model
tr <- idaTree(Species~.,idf,"ID",modelname="MYTREEMODEL")
#Print the model
print(tr)
#Plot the model
plot(tr)
#Apply the model to data
pred <- predict(tr,idf,id="ID")
#Inspect the predictions
head(pred)
## End(Not run)