R: Evaluate the model based on presence-only data.

evaluate_po {itsdm}

R Documentation

Evaluate the model based on presence-only data.

Description

This function will calculate two major types of evaluation metrics in terms of presence-only data. The first type is presence-only customized metrics, such as Contrast Validation Index (CVI), continuous Boyce index (CBI), and ROC_ratio. The second type is presence-background evaluation metrics by extracting background points as pseudo absence observations.

Usage

evaluate_po(
  model,
  occ_pred,
  bg_pred = NULL,
  var_pred,
  threshold = NULL,
  visualize = FALSE
)

Arguments

`model`	(`isolation_forest`) The extended isolation forest SDM. It could be the item `model` of `POIsotree` made by function `isotree_po`.
`occ_pred`	(`vector` of `numeric`) A `vector` contains predicted values at occurrence locations.
`bg_pred`	(`vector` of `numeric`) the vector contains predicted values with same number of background points.
`var_pred`	(`vector` of `numeric`) the vector contains predicted values of the whole area. The reason to take a vector is to keep this function flexible for multiple types of output.
`threshold`	(`numeric` or `NULL`) The threshold to calculate threshold-based evaluation metrics. If `NULL`, a recommended threshold will be calculated based on optimal TSS value. The default is `NULL`.
`visualize`	(`logical`) If `TRUE`, plot the evaluation figures. The default is `FALSE`.

Details

CVI is the proportion of presence points falling in cells having a threshold (0.5 for example) habitat suitability index minus the proportion of cells within this range of threshold of the model. Here we used varied thresholds: 0.25, 0.5, and 0.75.
continuous Boyce index (CBI) is made with a 100 resolution of moving windows and Kendall method.
ROC_ratio curve plots the proportion of presences falling above a range of thresholds against the proportion of cells falling above the range of thresholds. The area under the modified ROC curve was then called AUC_ratio.
Sensitivity (TPR) = TP/(TP + FN)
Specificity (TNR) = TN/(TN + FP)
True skill statistic (TSS) = Sensitivity + specificity - 1
Jaccard's similarity index = TP/(FN + TP + FP)
Sørensen's similarity index (F-measure) = 2TP/(FN + 2TP + FP)
Overprediction rate = FP/(TP + FP)
Underprediction rate = FN/(TP + FN)

Value

(POEvaluation) A list of

po_evaluation is presence-only evaluation metrics. It is a list of
- cvi (list) A list of CVI with 0.25, 0.5, and 0.75 as threshold
- boyce (list) A list of items related to continuous Boyce index (CBI)
- roc_ratio (list) A list of ROC ratio and AUC ratio
pb_evaluation is presence-background evaluation metrics. It is a list of
- confusion matrix (table) A table of confusion matrix. The columns are true values, and the rows are predicted values.
- sensitivity (numeric) The sensitivity or TPR
- specificity (numeric) The specificity or TNR
- TSS (list) A list of info related to true skill statistic (TSS)
  - cutoff (vector of numeric) A vector of cutoff threshold values
  - tss (vector of numeric) A vector of TSS for each cutoff threshold
  - Recommended threshold (numeric) A recommended threshold according to TSS
  - Optimal TSS (numeric) The best TSS value
- roc (list) A list of ROC values and AUC value
- Jaccard's similarity index (numeric) The Jaccard's similarity index
- Sørensen's similarity index (numeric) The Sørensen's similarity index or F-measure
- Overprediction rate (numeric) The Overprediction rate
- Underprediction rate (numeric) The Underprediction rate

References

Peterson, A. Townsend, Monica Papeş, and Jorge Soberón. "Rethinking receiver operating characteristic analysis applications in ecological niche modeling." Ecological modelling 213.1 (2008): 63-72. doi:10.1016/j.ecolmodel.2007.11.008
Hirzel, Alexandre H., et al. "Evaluating the ability of habitat suitability models to predict species presences." Ecological modelling 199.2 (2006): 142-152. doi:10.1016/j.ecolmodel.2006.05.017
Hirzel, Alexandre H., and Raphaël Arlettaz. "Modeling habitat suitability for complex species distributions by environmental-distance geometric mean." Environmental management 32.5 (2003): 614-623. doi:10.1007/s00267-003-0040-3
Leroy, Boris, et al. "Without quality presence-absence data, discrimination metrics such as TSS can be misleading measures of model performance." Journal of Biogeography 45.9 (2018): 1994-2002. doi:10.1111/jbi.13402

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With perfect_presence mode,
# which should be very rare in reality.
mod <- isotree_po(
  obs_mode = "perfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

# Without background samples or absences
eval_train <- evaluate_po(
  mod$model,
  occ_pred = mod$pred_train$prediction,
  var_pred = na.omit(as.vector(mod$prediction[[1]])))
print(eval_train)

# With background samples
bg_pred <- st_extract(
  mod$prediction, mod$background_samples) %>%
  st_drop_geometry()
eval_train <- evaluate_po(
  mod$model,
  occ_pred = mod$pred_train$prediction,
  bg_pred = bg_pred$prediction,
  var_pred = na.omit(as.vector(mod$prediction[[1]])))
plot(eval_train)
#'

[Package itsdm version 0.2.1 Index]