DTModel {PredPsych} | R Documentation |
Generic Decision Tree Function
Description
A simple function to create Decision Trees
Usage
DTModel(Data, classCol, selectedCols, tree, cvType, nTrainFolds,
ntrainTestFolds, modelTrainFolds, foldSep, cvFraction,
extendedResults = FALSE, SetSeed = TRUE, silent = FALSE,
NewData = NULL, ...)
Arguments
Data |
(dataframe) a data frame with regressors and response |
classCol |
(numeric or string) which column should be used as response col |
selectedCols |
(optional) (numeric or string) which columns should be treated as data(features + response) (defaults to all columns) |
tree |
which decision tree model to implement; One of the following values:
|
cvType |
(optional) (string) which type of cross-validation scheme to follow - only in case of CARTCV or CARTNACV; One of the following values:
|
nTrainFolds |
(optional) (parameter for only k-fold cross-validation) No. of folds in which to further divide Training dataset |
ntrainTestFolds |
(optional) (parameter for only k-fold cross-validation) No. of folds for training and testing dataset |
modelTrainFolds |
= (optional) (parameter for only k-fold cross-validation) specific folds from the first train/test split (ntrainTestFolds) to use for training |
foldSep |
(numeric) (parameter for only Leave-One_subject Out) mandatory column number for Leave-one-subject out cross-validation. |
cvFraction |
(optional) (numeric) Fraction of data to keep for training data |
extendedResults |
(optional) (logical) Return extended results with model and other metrics |
SetSeed |
(optional) (logical) Whether to setseed or not. use SetSeed to seed the random number generator to get consistent results; |
silent |
(optional) (logical) whether to print messages or not |
NewData |
(optional) (dataframe) New Data frame features for which the class membership is requested |
... |
(optional) additional arguments for the function |
Details
The function implements the Decision Tree models (DT models). DT models fall under the general "Tree based methods" involving generation of a recursive binary tree (Hastie et al., 2009). In terms of input, DT models can handle both continuous and categorical variables as well as missing data. From the input data, DT models build a set of logical "if ..then" rules that permit accurate prediction of the input cases.
The function "rpart" handles the missing data by creating surrogate variables instead of removing them entirely (Therneau, & Atkinson, 1997). This could be useful in case the data contains multiple missing values.
Unlike regression methods like GLMs, Decision Trees are more flexible and can model nonlinear interactions.
Value
model result for the input tree Results
or Test accuracy accTest
based on tree
. If extendedResults
= TRUE
outputs Test accuracy accTest
of discrimination,ConfMatrix
Confusion matrices and fit
the model
and ConfusionMatrixResults
Overall cross-validated confusion matrix results
Author(s)
Atesh Koul, C'MON unit, Istituto Italiano di Tecnologia
References
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer Series in Statistics (2nd ed., Vol. 1). New York, NY: Springer New York.
Terry Therneau, Beth Atkinson and Brian Ripley (2015). rpart: Recursive Partitioning and Regression Trees. R package version 4.1-10. https://CRAN.R-project.org/package=rpart
Therneau, T. M., & Atkinson, E. J. (1997). An introduction to recursive partitioning using the RPART routines (Vol. 61, p. 452). Mayo Foundation: Technical report.
Examples
# generate a cart model for 10% of the data with cross-validation
model <- DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112), tree='CARTCV',cvType = "holdout")
# Output:
# Performing Decision Tree Analysis
#
# [1] "Generating crossvalidated Tree With Missing Values"
#
# Performing holdout Cross-validation
#
# cvFraction was not specified,
# Using default value of 0.8 (cvFraction = 0.8)"
# Proportion of Test/Train Data was : 0.2470588
#
# [1] "Test holdout Accuracy is 0.62"
# holdout CART Analysis:
# cvFraction : 0.8
# Test Accuracy 0.62
# *Legend:
# cvFraction = Fraction of data to keep for training data
# Test Accuracy = Accuracy from the Testing dataset
#' # --CART MOdel --
# Alternate uses:
# k-fold cross-validation with removing missing values
model <- DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
tree='CARTNACV',cvType="folds")
# holdout cross-validation without removing missing values
model <- DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
tree='CARTCV',cvType = "holdout")
# k-fold cross-validation without removing missing values
model <- DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
tree='CARTCV',cvType="folds")