R: Thresholds needed to create the extended confusion matrix

thresholds {confcons}

R Documentation

Thresholds needed to create the extended confusion matrix

Description

Calculate the two thresholds distinguishing certain negatives/positives from uncertain predictions. The thresholds are needed to create the extended confusion matrix and are further used for confidence calculation.

Usage

thresholds(observations, predictions = NULL, type = "mean", range = 0.5)

Arguments

`observations`	Either an integer or logical vector containing the binary observations where presences are encoded as `1`s/`TRUE`s and absences as `0`s/`FALSE`s.
`predictions`	A numeric vector containing the predicted probabilities of occurrence typically within the `[0, 1]` interval. `length(predictions)` should be equal to `length(observations)` and the order of the elements should match. `predictions` is optional: needed and used only if `type` is 'mean' and ignored otherwise.
`type`	A character vector of length one containing the value 'mean' (for calculating mean of the predictions within known presences and absences) or 'information' (for calculating thresholds based on relative information gain) . Defaults to 'mean'.
`range`	A numeric vector of length one containing a value from the `]0, 0.5]` interval. It is the parameter of the information-based method and is used only if `type` is 'information'. The larger the `range` is, the more predictions are treated as uncertain. Defaults to 0.5.

Value

A named numeric vector of length 2. The first element ('threshold1') is the mean of probabilities predicted to the absence locations distinguishing certain negatives (certain absences) from uncertain predictions. The second element ('threshold2') is the mean of probabilities predicted to the presence locations distinguishing certain positives (certain presences) from uncertain predictions. For a typical model better than the random guess, the first element is smaller than the second one. The returned value might contain NaN(s) if the number of observed presences and/or absences is 0.

Note

thresholds() should be called using the whole dataset containing both training and evaluation locations.

Examples

set.seed(12345)

# Using logical observations:
observations_1000_logical <- c(rep(x = FALSE, times = 500),
                               rep(x = TRUE, times = 500))
predictions_1000 <- c(runif(n = 500, min = 0, max = 0.7),
                      runif(n = 500, min = 0.3, max = 1))
thresholds(observations = observations_1000_logical,
           predictions = predictions_1000) # 0.370 0.650

# Using integer observations:
observations_4000_integer <- c(rep(x = 0L, times = 3000),
                               rep(x = 1L, times = 1000))
predictions_4000 <- c(runif(n = 3000, min = 0, max = 0.8),
                      runif(n = 1000, min = 0.2, max = 0.9))
thresholds(observations = observations_4000_integer,
           predictions = predictions_4000) # 0.399 0.545

# Wrong parameterization:
try(thresholds(observations = observations_1000_logical,
               predictions = predictions_4000)) # error
set.seed(12345)
observations_4000_numeric <- c(rep(x = 0, times = 3000),
                               rep(x = 1, times = 1000))
predictions_4000_strange <- c(runif(n = 3000, min = -0.3, max = 0.4),
                              runif(n = 1000, min = 0.6, max = 1.5))
try(thresholds(observations = observations_4000_numeric,
               predictions = predictions_4000_strange)) # multiple warnings
mask_of_normal_predictions <- predictions_4000_strange >= 0 & predictions_4000_strange <= 1
thresholds(observations = as.integer(observations_4000_numeric)[mask_of_normal_predictions],
           predictions = predictions_4000_strange[mask_of_normal_predictions]) # OK

[Package confcons version 0.3.1 Index]