riskchart {rattle} | R Documentation |
Plot a risk chart
Description
Plots a Rattle Risk Chart for binary classification models using ggplot2. Such a chart has been developed in a practical context to present the performance of data mining models to clients, plotting a caseload against performance, allowing a client to see the tradeoff between coverage and performance.
Usage
riskchart(pr,
ac,
ri = NULL,
title = "Risk Chart",
title.size = 10,
subtitle = NULL,
caption = TRUE,
show.legend = TRUE,
optimal = NULL,
optimal.label = "",
chosen = NULL,
chosen.label = "",
include.baseline = TRUE,
dev = "",
filename = "",
show.knots = NULL,
show.lift = TRUE,
show.precision = TRUE,
show.maximal = TRUE,
risk.name = "Risk",
recall.name = "Recall",
precision.name = "Precision",
thresholds = NULL,
legend.horiz = TRUE)
Arguments
pr |
The predicted class for each observation. |
ac |
The actual class for each observation. |
ri |
The risk class for each observation. |
title |
the main title to place at the top of the plot. |
title.size |
font size for the main title. |
subtitle |
subtitle under the main title. |
caption |
caption for the bottom right of plot. |
show.legend |
whether to display the legend in the plot. |
optimal |
a caseload (percentage or fraction) that represents an
optimal performance point which is also plotted. If instead the value
is |
optimal.label |
a string which is added to label the line drawn as the optimal point. |
chosen |
a caseload (percentage or fraction) that represents a user chosen optimal performance point which is also plotted. |
chosen.label |
a string which is added to label the line drawn as the chosen point. |
include.baseline |
if TRUE (the default) then display the diagonal baseline. |
dev |
a string which, if supplied, identifies a device type as
the target for the plot. This might be one of |
filename |
a string naming a file. If |
show.knots |
a vector of caseload values at which a vertical line should be drawn. These might correspond, for example, to individual paths through a decision tree, illustrating the impact of each path on the caseload and performance. |
show.lift |
whether to label the right axis with lift. |
show.precision |
whether to show the precision plot. |
show.maximal |
whether to show the maximal performance line. |
risk.name |
a string used within the plot's legend that gives a
name to the risk. Often the risk is a dollar amount at risk from a
fraud or from a bank loan point of view, so the default is
|
recall.name |
a string used within the plot's legend that gives a
name to the recall. The recall is often the percentage of cases that
are positive hits, and in practise these might correspond to known
cases of fraud or reviews where some adjustment to perhaps a incom tax
return or application for credit had to be made on reviewing the case,
and so the default is |
precision.name |
a string used within the plot's legend that gives
a name to the precision. A common name for precision is |
thresholds |
whether to display scores along the top axis. |
legend.horiz |
whether to display a horizontal legend. |
Details
Caseload is the percentage of the entities in the dataset covered by the model at a particular probability cutoff, so that with a cutoff of 0, all (100%) of the entities are covered by the model. With a cutoff of 1 (0%) no entities are covered by the model. A diagonal line is drawn to represent a baseline random performance. Then the percentage of positive cases (the recall) covered for a particular caseload is plotted, and optionally a measure of the percentage of the total risk that is also covered for a particular caseload may be plotted. Such a chart allows a user to select an appropriate tradeoff between caseload and performance. The charts are similar to ROC curves. The precision (i.e., strike rate) is also plotted.
Author(s)
References
Package home page: https://rattle.togaware.com
See Also
evaluateRisk
, genPlotTitleCmd
.
Examples
## Not run:
## Use rpart to build a decision tree.
library(rpart)
## Set up the data for modelling.
set.seed(42)
ds <- weather
target <- "RainTomorrow"
risk <- "RISK_MM"
ignore <- c("Date", "Location", risk)
vars <- setdiff(names(ds), ignore)
nobs <- nrow(ds)
form <- formula(paste(target, "~ ."))
train <- sample(nobs, 0.7*nobs)
test <- setdiff(seq_len(nobs), train)
actual <- ds[test, target]
risks <- ds[test, risk]
# Build the model.
model <- rpart(form, data=ds[train, vars])
## Obtain predictions.
predicted <- predict(model, ds[test, vars], type="prob")[,2]
## Plot the Risk Chart.
riskchart(predicted, actual, risks)
## End(Not run)