R: Decision and Regression tree

idaTree {ibmdbR}

R Documentation

Decision and Regression tree

Description

This function generates a tree model based on the contents of an IDA data frame (ida.data.frame).

Usage

idaTree( form, data, id, minsplit=50, maxdepth=10, qmeasure=NULL,
         minimprove=0.01, eval=NULL, valtable=NULL, modelname=NULL)

## S3 method for class 'idaTree'
plot(x,...)  
## S3 method for class 'idaTree'
predict(object, newdata, id, ...)

Arguments

`form`	A `formula` object that specifies both the name of the column that contains the categorical target variable and either a list of columns separated by plus symbols (each column corresponds to one predictor variable) or a single period (to specify that all other columns in the IDA data frame are to be used as predictors.
`data`	An IDA data frame that contains the input data for the function. The input IDA data frame must include a column that contains a unique ID for each row.
`id`	The name of the column that contains a unique ID for each row of the input data.
`minsplit`	The minimum number of rows a node must contain to be split further.
`maxdepth`	The maximum depth (that is, the number of hierarchical levels) of the generated tree.
`qmeasure`	The measure that is to be used to prune the tree. For a decision tree, allowed values are `"Acc"` (this is the default) and `"wAcc"`. For a regression tree, allowed values are `"mse"` (this is the default), `"r2"`, `"pearson"`, and `"spearman"`.
`minimprove`	The minimum improvement. A node is not split further unless the split improves the class impurity by at least the amount specified for this parameter.
`eval`	The criterion that is to be used to calculate each split. For a decision tree, allowed values are `"entropy"` (this is the default) and `"gini"`. For a regression tree, the only allowed value is `"variance"` (this is the default).
`valtable`	When the output tree is to be pruned using external data, use this parameter to specify the fully-qualified name of the table that contains that data. Otherwise, specify NULL.
`modelname`	The name under which the model is stored in the database. This is the name that is specified when using functions such as `idaRetrieveModel` or `idaDropModel`.
`object`	An object of the class `idaTree`.
`x`	An object of the class `idaTree`.
`newdata`	A IDA data frame that contains the data to which to apply the model.
`...`	additional arguments to be passed to plot or predict.

Details

The idaTree function uses a top-down, iterative procedure to generate a decision-tree or regression-tree model, depending on the type of the target variable. The resulting model comprises a network of nodes and connectors, and each subnode is the endpoint of a binary split.

A node is not split further when any of the following are true:

The node has a uniform class (and therefore cannot be split further).
Additional splits do not improve the class impurity by at least the amount specified by minimprove.
The number of rows contained by the node is less than the value specified by minsplit.
The tree depth reaches the value specified by maxdepth.

If variable that is used to determine a split does not have a value, the corresponding row remains in the node that is being split.

The output of the print function for a idaTree object is a textual description of the corresponding model.

The output of the plot function for a idaTree object is a graphical representation of the corresponding model.

Models are stored persistently in the database under the name modelname. Model names cannot have more than 64 characters and cannot contain white spaces. They need to be quoted like table names, otherwise they will be treated upper case by default. Only one model with a given name is allowed in the database at a time. If a model with modelname already exists, you need to drop it with idaDropModel first before you can create another one with the same name. The model name can be used to retrieve the model later (idaRetrieveModel).

The predict.idaTree method applies the model to the data in a table and returns a IDA data frame that contains a list of tuples, each of which comprises one row ID and one prediction.

Value

The idaTree function returns an object of classes idaTree and rpart.

Examples

## Not run: 

#Create a pointer to the table IRIS
idf <- ida.data.frame('IRIS')

#Create a tree model
tr <- idaTree(Species~.,idf,"ID",modelname="MYTREEMODEL")

#Print the model
print(tr)

#Plot the model
plot(tr)

#Apply the model to data
pred <- predict(tr,idf,id="ID")

#Inspect the predictions
head(pred)


## End(Not run)

[Package ibmdbR version 1.51.0 Index]