DF_CV {Dforest} | R Documentation |
Decision Forest algorithm: Model training with Cross-validation
Description
Decision Forest algorithm: Model training with Cross-validation Default is 5-fold cross-validation
Usage
DF_CV(X, Y, stop_step = 10, CV_fold = 5, Max_tree = 20, min_split = 10,
cp = 0.1, Filter = F, p_val = 0.05, Method = "bACC", Quiet = T,
Grace_val = 0.05, imp_accu_val = 0.01, imp_accu_criteria = F)
Arguments
X |
Training Dataset |
Y |
Training data endpoint |
stop_step |
How many extra step would be processed when performance not improved, 1 means one extra step |
CV_fold |
Fold of cross-validation (Default = 5) |
Max_tree |
Maximum tree number in Forest |
min_split |
minimum leaves in tree nodes |
cp |
parameters to pruning decision tree, default is 0.1 |
Filter |
doing feature selection before training |
p_val |
P-value threshold measured by t-test used in feature selection, default is 0.05 |
Method |
Which is used for evaluating training process. MIS: Misclassification rate; ACC: accuracy |
Quiet |
if TRUE (default), don't show any message during the process |
Grace_val |
Grace Value in evaluation: the next model should have a performance (Accuracy, bACC, MCC) not bad than previous model with threshold |
imp_accu_val |
improvement in evaluation: adding new tree should improve the overall model performance (Accuracy, bACC, MCC) by threshold |
imp_accu_criteria |
if TRUE, model must have improvement in accumulated accuracy |
Value
.$performance: Overall training accuracy (Cross-validation)
.$pred: Detailed training prediction (Cross-validation)
.$detail: Detailed usage of Decision tree Features/Models and their performances in all CVs
.$Method: pass evaluating Methods used in training
.$cp: pass cp value used in training decision trees
Examples
##data(iris)
X = iris[,1:4]
Y = iris[,5]
names(Y)=rownames(X)
random_seq=sample(nrow(X))
split_rate=3
split_sample = suppressWarnings(split(random_seq,1:split_rate))
Train_X = X[-random_seq[split_sample[[1]]],]
Train_Y = Y[-random_seq[split_sample[[1]]]]
CV_result = DF_CV(Train_X, Train_Y)