R: Metrics to evaluate a classifier accuracy in imbalanced...

accuracy.meas {ROSE}

R Documentation

Metrics to evaluate a classifier accuracy in imbalanced learning

Description

This function computes precision, recall and the F measure of a prediction.

Usage

accuracy.meas(response, predicted, threshold = 0.5)

Arguments

`response`	A vector of responses containing two classes to be used to evaluate prediction accuracy. It can be of class `"factor"`, `"numeric"` or `"character"`.
`predicted`	A vector containing a prediction for each observation. This can be of class `"factor"` or `"character"` if the predicted label classes are provided or `"numeric"` for the probabilities of the rare class (or a monotonic function of them).
`threshold`	When `predicted` is of class `numeric`, it defines the probability threshold to classify an example as positive. Default value is meant for predicted probabilities and is set to 0.5. See further details below. Ignored if `predicted` is of class `factor`

Details

Prediction of positive or negative labels depends on the classification threshold, here defined as the value such that observations with predicted value greater than the threshold are assigned to the positive class. Some caution is due in setting the threshold as well as in using the default setting both because the default value is meant for predicted probabilities and because the default 0.5 is not necessarily the optimal choice for imbalanced learning. Smaller values set for the threshold correspond to assign a larger misclassification costs to the rare class, which is usually the case.

Precision is defined as follows:

\frac{\mbox{true positives}}{\mbox{true positives + false positives}}

Recall is defined as:

\frac{\mbox{true positives}}{\mbox{true positives + false negative}}

The F measure is the harmonic average between precision and recall:

2 \cdot \frac{\mbox{precision} \cdot \mbox{recall}}{\mbox{precision+recall}}

Value

The value is an object of class accuracy.meas which has components

`Call`	The matched call.
`threshold`	The selected threshold.
`precision`	A vector of length one giving the precision of the prediction
`recall`	A vector of length one giving the recall of the prediction
`F`	A vector of length one giving the F measure

References

Fawcet T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27 (8), 861–875.

Examples

# 2-dimensional example
# loading data
data(hacide)

# imbalance on training set
table(hacide.train$cls)

# model estimation using logistic regression
fit.hacide  <- glm(cls~., data=hacide.train, family="binomial")

# prediction on training set
pred.hacide.train <- predict(fit.hacide, newdata=hacide.train,
                             type="response")

# compute accuracy measures (training set)
accuracy.meas(hacide.train$cls, pred.hacide.train, threshold = 0.02)

# imbalance on test set 
table(hacide.test$cls)

# prediction on test set
pred.hacide.test <- predict(fit.hacide, newdata=hacide.test,
                            type="response")

# compute accuracy measures (test set)
accuracy.meas(hacide.test$cls, pred.hacide.test, threshold = 0.02)

[Package ROSE version 0.0-4 Index]