calibrate {CORElearn} | R Documentation |
Given probability scores predictedProb
as provided for example by a call to predict.CoreModel
and using one of available methods given by methods
the function calibrates predicted probabilities so that they
match the actual probabilities of a binary class 1 provided by correctClass
.
The computed calibration can be applied to the scores returned by that model.
calibrate(correctClass, predictedProb, class1=1, method = c("isoReg","binIsoReg","binning","mdlMerge"), weight=NULL, noBins=10, assumeProbabilities=FALSE) applyCalibration(predictedProb, calibration)
correctClass |
A vector of correct class labels for a binary classification problem. |
predictedProb |
A vector of predicted class 1 (probability) scores. In |
class1 |
A class value (factor) or an index of the class value to be taken as a class to be calibrated. |
method |
One of |
weight |
If specified, should be of the same length as |
noBins |
The value of parameter depends on the parameter |
assumeProbabilities |
If |
calibration |
The list resulting from a call to |
Depending on the specified method
one of the following calibration methods is executed.
"isoReg"
isotonic regression calibration based on pair-adjacent violators (PAV) algorithm.
"binning"
calibration into a pre-specified number of bands given by noBins
parameter, trying to make bins of equal weight.
"binIsoReg"
first binning method is executed, following by a isotonic regression calibration.
"mdlMerge"
first intervals are merged by a MDL gain criterion into a prespecified number of intervals, following by the isotonic regression calibration.
If model="binning"
the parameter noBins
specifies the desired number of bins i.e., calibration bands;
if model="binIsoReg"
the parameter noBins
specifies the number of initial bins that are formed by binning before isotonic regression is applied;
if model="mdlMerge"
the parameter noBins
specifies the number of bins formed after first applying isotonic regression. The most similar bins are merged using MDL criterion.
A function returns a list with two vector components of the same length:
interval |
The boundaries of the intervals. Lower boundary 0 is not explicitly included but should be taken into account. |
calProb |
The calibrated probabilities for each corresponding interval. |
Marko Robnik-Sikonja
I. Kononenko, M. Kukar: Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood, 2007
A. Niculescu-Mizil, R. Caruana: Predicting Good Probabilities With Supervised Learning. Proceedings of the 22nd International Conference on Machine Learning (ICML'05), 2005
reliabilityPlot
,
CORElearn
,
predict.CoreModel
.
# generate data set separately for training the model, # calibration of probabilities and testing train <-classDataGen(noInst=200) cal <-classDataGen(noInst=200) test <- classDataGen(noInst=200) # build random forests model with default parameters modelRF <- CoreModel(class~., train, model="rf", maxThreads=1) # prediction predCal <- predict(modelRF, cal, rfPredictClass=FALSE) predTest <- predict(modelRF, test, rfPredictClass=FALSE) destroyModels(modelRF) # clean up, model not needed anymore # calibrate for a chosen class1 and method class1<-1 calibration <- calibrate(cal$class, predCal$prob[,class1], class1=class1, method="isoReg",assumeProbabilities=TRUE) # apply the calibration to the testing set calibratedProbs <- applyCalibration(predTest$prob[,class1], calibration) # the calibration of probabilities can be visualized with # reliabilityPlot function