R: Calculate misclassification cost

misclassCost {CustomerScoringMetrics}

R Documentation

Calculate misclassification cost

Description

Calculates the absolute misclassification cost value for a set of predictions.

Usage

misclassCost(predTest, depTest, costType = c("costRatio", "costMatrix",
  "costVector"), costs = NULL, cutoff = 0.5, dyn.cutoff = FALSE,
  predVal = NULL, depVal = NULL)

Arguments

`predTest`	Vector with predictions (real-valued or discrete)
`depTest`	Vector with real class labels
`costType`	An argument that specifies how the cost information is provided. This should be either `"costRatio"` or `"costMatrix"`. In the former case, a single value is provided which reflects the cost ratio (the ratio of the cost associated with a false negative to the cost associated with a false positive). In the latter case, a full (4x4) misclassification cost matrix should be provided in the form `rbind(c(0,3),c(15,0))` where in this example 3 is the cost for a false positive, and 15 the cost for a false negative case.
`costs`	see `costType`
`cutoff`	Threshold for converting real-valued predictions into class predictions. Default 0.5.
`dyn.cutoff`	Logical indicator to enable dynamic threshold determination using validation sample predictions. In this case, the function determines, using validation data, the indidicence (occurrence percentage of the customer behavior or characterstic of interest) and chooses a cutoff value so that the number of predicted positives is equal to the number of true positives. If `TRUE`, then the value for the cutoff parameter is ignored.
`predVal`	Vector with predictions (real-valued or discrete). Only used if `dyn.cutoff` is `TRUE`.
`depVal`	Optional vector with true class labels for validation data. Only used if `dyn.cutoff` is `TRUE`.

Value

A list with the following elements:

`misclassCost`	Total misclassification cost value
`cutoff`	the threshold value used to convert real-valued predictions to class predictions

Author(s)

Koen W. De Bock, kdebock@audencia.com

References

Witten, I.H., Frank, E. (2005): Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Chapter 5. Morgan Kauffman.

Examples

## Load response modeling data set
data("response")
## Generate cost vector
costs <- runif(nrow(response$test), 1, 100)
## Apply misclassCost function to obtain the misclassification cost for the
## predictions for test sample. Assume a cost ratio of 5.
emc<-misclassCost(response$test[,2],response$test[,1],costType="costVector", costs=costs)
print(emc$EMC)

[Package CustomerScoringMetrics version 1.0.0 Index]