roc.curve {randomUniformForest} | R Documentation |
ROC and precision-recall curves for random Uniform Forests
Description
plot ROC and precision-recall curves for objects of class randomUniformForest and compute F-beta score. It also works for any other model that provides predicted labels (but only for ROC curve).
Usage
roc.curve(X, Y, classes,
positive = classes[2],
ranking.threshold = 0,
ranking.values = 0,
falseDiscoveryRate = FALSE,
plotting = TRUE,
printValues = TRUE,
Beta = 1)
Arguments
X |
a vector (or a factor) of predictions (with two classes) or an object of class randomUniformForest (with OOB option enabled). |
Y |
a vector of numeric (integer) responses, or a factor, with two classes. |
classes |
a vector (or a factor) of values that designate the class. |
positive |
convention for the positive class (e.g. the minority class). |
ranking.threshold |
option currently implemented but not fully tested. |
ranking.values |
option currently implemented but not fully tested. |
falseDiscoveryRate |
if TRUE, precision-recall curve is plotted. if FALSE, default value, ROC curve is plotted. |
plotting |
plotting the ROC curve ? |
printValues |
display values to screen ? |
Beta |
'beta' value for F-beta score. |
Author(s)
Saip Ciss saip.ciss@wanadoo.fr
See Also
importance.randomUniformForest
Examples
## Classification : "breast cancer" data
# http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
data(breastCancer)
breastCancer.data <- breastCancer
# remove ID (first column) and divide data in train and test set
breastCancer.data = breastCancer.data[,-1]
n <- nrow(breastCancer.data)
p <- ncol(breastCancer.data)
trainTestIdx <- cut(sample(1:n, n), 2, labels= FALSE)
# train examples
breastCancer.data.train <- breastCancer.data[trainTestIdx == 1, -p]
breastCancer.class.train <- as.factor(breastCancer.data[trainTestIdx == 1, p])
# rename class in benign (class 2) and malignant (class 4) to have a better view
levels(breastCancer.class.train) = c("benign", "malignant")
# test data
breastCancer.data.test <- breastCancer.data[trainTestIdx == 2, -p]
breastCancer.class.test <- as.factor(breastCancer.data[trainTestIdx == 2, p])
levels(breastCancer.class.test) = c("benign", "malignant")
# compute model : train then test in the same function and assign class weights
# to better match the distribution (OOB errors and Breiman's bounds can help to choose weights)
# Note that in this case 'recall' (or sensitivity) is the objective,
# e.g. match all possible cases of malignant tumours even if false positive rate increase
#(in this latter case, further steps will reveal the truth). If malignant tumour is not detected,
# then diagnosis error is, by far, more critical.
breastCancer.ruf <- randomUniformForest(breastCancer.data.train, breastCancer.class.train,
xtest = breastCancer.data.test, ytest = breastCancer.class.test,
classwt = c(1, 3.5), threads = 2, ntree = 40, BreimanBounds = FALSE)
# get a summary of model
breastCancer.ruf
## plot ROC Curve for test data
# roc.curve(breastCancer.ruf, breastCancer.class.test, levels(breastCancer.class.test))
## plot precision-recall curve for test data
# roc.curve(breastCancer.ruf, breastCancer.class.test, levels(breastCancer.class.test),
# falseDiscoveryRate = TRUE)
## associate cut-off and purely random forest as an alternative to find maximum malignant cases
## with a low false positive rate.
## 'classcutoff' option is a bit tricky. Let's take the example we will use below.
## classcutoff = c("benign", 1.25) means that number of votes for class "benign"
## will be weighted by 0.4 (= Cte/1.25, where Cte = 0.5) for each response.
## Hence "benign" will never have majority unless it has 2.5 (1.25/0.5) times more votes
## than "malignant" class and all votes sum to total number of trees.
# breastCancer.cutOff.ruf <- randomUniformForest(breastCancer.data.train, breastCancer.class.train,
# xtest = breastCancer.data.test, ytest = breastCancer.class.test, classcutoff = c("benign", 1.25),
# randomfeature = TRUE, ntree = 50, threads = 2, BreimanBounds = FALSE)
# roc.curve(breastCancer.cutOff.ruf, breastCancer.class.test, levels(breastCancer.class.test))
# roc.curve(breastCancer.cutOff.ruf, breastCancer.class.test, levels(breastCancer.class.test),
# falseDiscoveryRate = TRUE)
## evaluate OOB data, when there is no test set
# breastCancer.ruf <- randomUniformForest(breastCancer.data.train, breastCancer.class.train,
# classwt = c(1, 3.5), threads = 2)
# roc.curve(breastCancer.ruf, breastCancer.class.train, levels(breastCancer.class.train))