CDdiagram.BD {performanceEstimation} | R Documentation |
CD diagrams for the post-hoc Boferroni-Dunn test
Description
This function obtains a Critical Difference (CD) diagram for the post-hoc Bonferroni-Dunn test along the lines defined by Demsar (2006). These diagrams provide an interesting visualization of the statistical significance of the observed paired differences between a set of workflows and a selected baseline wrokflow. They allow us to compare a set of alternative workflows against this baseline and answer the question whether the differences are statisticall y significant.
Usage
CDdiagram.BD(r, metric = names(r)[1])
Arguments
r |
A list resulting from a call to |
metric |
The metric for which the CD diagram will be obtained (defaults to the first metric of the comparison). |
Details
Critical Difference (CD) diagrams are interesting sucint visualizations of the results of a Bonferroni-Dunn post-hoc test that is designed to check the statistical significance between the differences in average rank of a set of workflows against a baseline workflow, on a set of predictive tasks.
In the resulting graph each workflow is represented by a colored
line. The X axis where the lines end represents the average rank position
of the respective workflow across all tasks. The null hypothesis is that
the average rank of a baseline workflow does not differ with
statistical significance (at some confidence level defined in the call
to pairedComparisons
that creates the object used to
obtain these graphs) from the average ranks of a set of
alternative workflows. An horizontal line connects the baseline
workflow with the alternative workflows for which we cannot reject
this hypothesis. This means that only the alternative workflows that
are not connect with the baseline can be considered as having an
average rank that is different from the one of the baseline with
statistical significance. To help spotting these differences the name
of the baseline workflow is shown in bold, and the names of the
alternative workflows whose difference is significant are shown in
italics.
Value
Nothing, the graph is draw on the current device.
Author(s)
Luis Torgo ltorgo@dcc.fc.up.pt
References
Demsar, J. (2006) Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 7, 1-30.
Torgo, L. (2014) An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models in R. arXiv:1412.0436 [cs.MS] http://arxiv.org/abs/1412.0436
See Also
CDdiagram.Nemenyi
,
CDdiagram.BD
,
signifDiffs
,
metricNames
,
performanceEstimation
,
topPerformers
,
topPerformer
,
rankWorkflows
,
metricsSummary
,
ComparisonResults
Examples
## Not run:
## Estimating MSE for 3 variants of both
## regression trees and SVMs, on two data sets, using one repetition
## of 10-fold CV
library(e1071)
data(iris)
data(Satellite,package="mlbench")
data(LetterRecognition,package="mlbench")
## running the estimation experiment
res <- performanceEstimation(
c(PredTask(Species ~ .,iris),
PredTask(classes ~ .,Satellite,"sat"),
PredTask(lettr ~ .,LetterRecognition,"letter")),
workflowVariants(learner="svm",
learner.pars=list(cost=1:4,gamma=c(0.1,0.01))),
EstimationTask(metrics=c("err","acc"),method=CV()))
## checking the top performers
topPerformers(res)
## now let us assume that we will choose "svm.v2" as our baseline
## carry out the paired comparisons
pres <- pairedComparisons(res,"svm.v2")
## obtaining a CD diagram comparing all workflows against
## the baseline (defined in the previous call to pairedComparisons)
CDdiagram.BD(pres,metric="err")
## End(Not run)