R: Multi-class AUC

multiclass.roc {pROC}

R Documentation

Multi-class AUC

Description

This function builds builds multiple ROC curve to compute the multi-class AUC as defined by Hand and Till.

Usage

multiclass.roc(...)
## S3 method for class 'formula'
multiclass.roc(formula, data, ...)
## Default S3 method:
multiclass.roc(response, predictor,
levels=base::levels(as.factor(response)), 
percent=FALSE, direction = c("auto", "<", ">"), ...)

Arguments

`response`	a factor, numeric or character vector of responses (true class), typically encoded with 0 (controls) and 1 (cases), as in `roc`.
`predictor`	either a numeric vector, containing the value of each observation, as in `roc`, or, a matrix giving the decision value (e.g. probability) for each class.
`formula`	a formula of the type `response~predictor`.
`data`	a matrix or data.frame containing the variables in the formula. See `model.frame` for more details.
`levels`	the value of the response for controls and cases respectively. In contrast with `levels` argument to `roc`, all the levels are used and combined to compute the multiclass AUC.
`percent`	if the sensitivities, specificities and AUC must be given in percent (`TRUE`) or in fraction (`FALSE`, default).
`direction`	in which direction to make the comparison? “auto” (default for univariate curves): automatically define in which group the median is higher and take the direction accordingly. Not available for multivariate curves. “>” (default for multivariate curves): if the predictor values for the control group are higher than the values of the case group (controls > t >= cases). “<”: if the predictor values for the control group are lower or equal than the values of the case group (controls < t <= cases).
`...`	further arguments passed to `roc`.

Details

This function performs multiclass AUC as defined by Hand and Till (2001). A multiclass AUC is a mean of several auc and cannot be plotted. Only AUCs can be computed for such curves. Confidence intervals, standard deviation, smoothing and comparison tests are not implemented.

The multiclass.roc function can handle two types of datasets: uni- and multi-variate. In the univariate case, a single predictor vector is passed and all the combinations of responses are assessed. I the multivariate case, a matrix or data.frame is passed as predictor. The columns must be named according to the levels of the response.

This function has been much less tested than the rest of the package and is more subject to bugs. Please report them if you find one.

Value

If predictor is a vector, a list of class “multiclass.roc” (univariate) or “mv.multiclass.roc” (multivariate), with the following fields:

`auc`	if called with `auc=TRUE`, a numeric of class “auc” as defined in `auc`. Note that this is not the standard AUC but the multi-class AUC as defined by Hand and Till.
`ci`	if called with `ci=TRUE`, a numeric of class “ci” as defined in `ci`.
`response`	the response vector as passed in argument. If `NA` values were removed, a `na.action` attribute similar to `na.omit` stores the row numbers.
`predictor`	the predictor vector as passed in argument. If `NA` values were removed, a `na.action` attribute similar to `na.omit` stores the row numbers.
`levels`	the levels of the response as defined in argument.
`percent`	if the sensitivities, specificities and AUC are reported in percent, as defined in argument.
`call`	how the function was called. See `match.call` for more details.

Warnings

If response is an ordered factor and one of the levels specified in levels is missing, a warning is issued and the level is ignored.

References

David J. Hand and Robert J. Till (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45(2), p. 171–186. DOI: doi: 10.1023/A:1010920819831.

Examples

####
# Examples for a univariate decision value
####
data(aSAH)

# Basic example
multiclass.roc(aSAH$gos6, aSAH$s100b)
# Produces an innocuous warning because one level has no observation

# Select only 3 of the aSAH$gos6 levels:
multiclass.roc(aSAH$gos6, aSAH$s100b, levels=c(3, 4, 5))

# Give the result in percent
multiclass.roc(aSAH$gos6, aSAH$s100b, percent=TRUE)

####
# Examples for multivariate decision values (e.g. class probabilities)
####

## Not run: 
# Example with a multinomial log-linear model from nnet
# We use the iris dataset and split into a training and test set
requireNamespace("nnet")
data(iris)
iris.sample <- sample(1:150)
iris.train <- iris[iris.sample[1:75],]
iris.test <- iris[iris.sample[76:150],]
mn.net <- nnet::multinom(Species ~ ., iris.train)

# Use predict with type="prob" to get class probabilities
iris.predictions <- predict(mn.net, newdata=iris.test, type="prob")
head(iris.predictions)

# This can be used directly in multiclass.roc:
multiclass.roc(iris.test$Species, iris.predictions)

## End(Not run)


# Let's see an other example with an artificial dataset
n <- c(100, 80, 150)
responses <- factor(c(rep("X1", n[1]), rep("X2", n[2]), rep("X3", n[3])))
# construct prediction matrix: one column per class

preds <- lapply(n, function(x) runif(x, 0.4, 0.6))
predictor <- as.matrix(data.frame(
                "X1" = c(preds[[1]], runif(n[2] + n[3], 0, 0.7)),
                "X2" = c(runif(n[1], 0.1, 0.4), preds[[2]], runif(n[3], 0.2, 0.8)),
                "X3" = c(runif(n[1] + n[2], 0.3, 0.7), preds[[3]])
             ))
multiclass.roc(responses, predictor)

# One can change direction , partial.auc, percent, etc:
multiclass.roc(responses, predictor, direction = ">")
multiclass.roc(responses, predictor, percent = TRUE, 
	partial.auc = c(100, 90), partial.auc.focus = "se")


# Limit set of levels
multiclass.roc(responses, predictor, levels = c("X1", "X2"))
# Use with formula. Here we need a data.frame to store the responses as characters
data <- cbind(as.data.frame(predictor), "response" = responses)
multiclass.roc(response ~ X1+X3, data)

[Package pROC version 1.18.5 Index]