TreeModelsAllSteps {LOGANTree} | R Documentation |
Data Partition and Tree-based Model Training
Description
Data Partition and Tree-based Model Training
Usage
TreeModelsAllSteps(
data = NULL,
proportion = 0.7,
seed = 2022,
methodlist = c("dt", "rf", "gbm"),
iternumber = 10,
dt.gridsearch = NULL,
rf.gridsearch = NULL,
gbm.gridsearch = NULL,
checkprogress = FALSE
)
Arguments
data |
A |
proportion |
A numeric value for the proportion of data to be put into model training. Default is set to 0.7. |
seed |
A numeric value for set.seed. It is set to be 2022 by default. |
methodlist |
A list of the tree-based methods to model. The default is methodlist = c("dt", "rf", "gbm"). |
iternumber |
A numeric value for the number of resampling iterations/number of folds for the cross-validation scheme. |
dt.gridsearch |
A |
rf.gridsearch |
A |
gbm.gridsearch |
A |
checkprogress |
Logical. Print the modeling progress if it is TRUE. The default is FALSE. |
Details
This function performs all the steps of a predictive analysis. First, the data is partitioned in the training and testing datasets using a stratified selection by the outcome variable as performed by the createDataPartition function from the caret package. Then, the selected classifiers are used for modeling the training dataset under a cross-validation scheme. Users have the possibility to choose which model they want to compare by specifying it on the methodlist
argument. The caretEnsemble package is used in the modeling process to ensure that all models follow the same resampling procedures. ROC is used to select the optimal model for each tree-based method using the largest value. Finally, a summary report is displayed.
Value
This function returns three lists:
DataPartition The partitioned datasets: training (cv_train) and testing (cv_test).
ModelObject An object with results from selected models
SummaryReport A data.frame
with the summary of model parameters. The summary report is shown automatically in the output.
Examples
cp025q01.wgt <- cp025q01.wgt[,-14]
colnames(cp025q01.wgt)[14] <- "perf"
ensemblist <- TreeModelsAllSteps(data = cp025q01.wgt,
checkprogress = TRUE)
ensemblist <- TreeModelsAllSteps(data = cp025q01.wgt,
methodlist = c("dt", "gbm"), checkprogress = TRUE)
ensemblist <- TreeModelsAllSteps(data = cp025q01.wgt,
methodlist = c("rf"),
rf.gridsearch = data.frame(mtry = 2, splitrule = "gini", min.node.size = 1),
checkprogress = TRUE)