optiThresh {modEvA} | R Documentation |
Optimize threshold for model evaluation.
Description
The 'optiThresh' function calculates optimal thresholds for a number of model evaluation measures (see threshMeasures
). Optimization is given for each measure, and/or for all measures according to particular criteria (e.g. Jimenez-Valverde & Lobo 2007; Liu et al. 2005; Nenzen & Araujo 2011). Results are given numerically and in plots.
Usage
optiThresh(model = NULL, obs = NULL, pred = NULL, interval = 0.01,
measures = c(modEvAmethods("threshMeasures"), modEvAmethods("similarity")),
optimize = modEvAmethods("optiThresh"), simplif = FALSE, plot = TRUE,
sep.plots = FALSE, xlab = "Threshold", na.rm = TRUE, rm.dup = FALSE, verbosity = 2, ...)
Arguments
model |
a binary-response model object of class "glm", "gam", "gbm", "randomForest" or "bart". If this argument is provided, 'obs' and 'pred' will be extracted with |
obs |
alternatively to 'model' and together with 'pred', a numeric vector of observed presences (1) and absences (0) of a binary response variable. Alternatively (and if 'pred' is a 'SpatRaster'), a two-column matrix or data frame containing, respectively, the x (longitude) and y (latitude) coordinates of the presence points, in which case the 'obs' vector will be extracted with |
pred |
alternatively to 'model' and together with 'obs', a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike. Must be of the same length and in the same order as 'obs'. Alternatively (and if 'obs' is a set of point coordinates), a 'SpatRaster' map of the predicted values for the entire evaluation region, in which case the 'pred' vector will be extracted with |
interval |
numeric value between 0 and 1 indicating the interval between the thresholds at which to calculate the evaluation measures. Defaults to 0.01. |
measures |
character vector indicating the names of the model evaluation measures for which to calculate optimal thresholds. The default is using all measures available in 'c(modEvAmethods("threshMeasures"), modEvAmethods("similarity"))'. |
optimize |
character vector indicating the threshold optimization criteria to use; "each" calculates the optimal threshold for each model evaluation measure, while the remaining options optimize all measures according to the specified criterion. The default is using all criteria available in 'modEvAmethods("optiThresh")'. |
simplif |
logical, whether to compute a faster simplified version. Used internally in other functions. |
plot |
logical, whether to plot the values of each evaluation measure at all thresholds. Ignored if simplif=TRUE. |
sep.plots |
logical. If TRUE, each plot is presented separately (you need to be recording R plot history to be able to browse through them all); if FALSE(the default), all plots are presented together in the same plotting window. |
xlab |
character vector indicating the label of the x axis. |
na.rm |
Logical value indicating whether missing values should be ignored in computations. Defaults to TRUE. |
rm.dup |
If |
verbosity |
integer specifying the amount of messages to display. Defaults to the maximum implemented; lower numbers (down to 0) decrease the number of messages. |
... |
additional arguments to pass to |
Value
This function returns a list with the following components:
all.thresholds |
a data frame with the values of all analysed measures at all analysed thresholds. |
optimals.each |
if "each" is among the threshold criteria specified in 'optimize', optimals.each is output as a data frame with the value of each measure at its optimal threshold, as well as the type of optimal for that measure (which may be the maximum for measures of goodness such as "Sensitivity", or the minimum for measures of badness such as "Omission"). |
optimals.criteria |
a data frame with the values of measure at the threshold that maximizes each of the criteria specified in 'optimize' (except for "each", see above). |
Note
"Sensitivity" is the same as "Recall", and "PPP" (positive predictive power) is the same as "Precision". "F1score"" is the harmonic mean of precision and recall.
Note
Some measures cannot be calculated for thresholds at which there are zeros in the confusion matrix, hence the eventual 'NaN' or 'Inf' in results. Also, optimization may be deceiving for some measures; use 'plot = TRUE' and inspect the plot(s).
Author(s)
A. Marcia Barbosa
References
Jimenez-Valverde A. & Lobo J.M. (2007) Threshold criteria for conversion of probability of species presence to either-or presence-absence. Acta Oecologica 31: 361-369.
Liu C., Berry P.M., Dawson T.P. & Pearson R.G. (2005) Selecting thresholds of occurrence in the prediction of species distributions. Ecography 28: 385-393.
Nenzen H.K. & Araujo M.B. (2011) Choice of threshold alters projections of species range shifts under climate change. Ecological Modelling 222: 3346-3354.
See Also
Examples
# load sample models:
data(rotif.mods)
# choose a particular model to play with:
mod <- rotif.mods$models[[1]]
## Not run:
optiThresh(model = mod)
## End(Not run)
# change some of the parameters:
optiThresh(model = mod, pch = 20,
measures = c("CCR", "Sensitivity", "kappa", "TSS", "Jaccard", "F1score"),
ylim = c(0, 1))
# you can also use optiThresh with vectors of observed and predicted
# values instead of with a model object:
## Not run:
optiThresh(obs = mod$y, pred = mod$fitted.values, pch = 20)
## End(Not run)
# 'obs' can also be a table of presence point coordinates
# and 'pred' a SpatRaster of predicted values