R: Evaluation of species distribution models using the pooling...

ecospat.poolingEvaluation {ecospat}

R Documentation

Evaluation of species distribution models using the pooling procedure

Description

This function evaluates species distribution models using 100% of the dataset by pooling the different runs of the cross validation as in Collart et al. 2021

Usage

  ecospat.poolingEvaluation(fit,
                            calib,
                            resp,
                            AlgoName = NULL,
                            metrics = c("SomersD","AUC","MaxTSS","MaxKappa","Boyce"),
                            ensembleEvaluation=FALSE,
                            w=NULL,
                            metricToEnsemble = "MaxTSS")

Arguments

`fit`	a list containing n data.frame or matrix, where n corresponds to the number of algorithm you want to evaluate. The data.frames (matrices) need to contain the model predictions (ranging between 0 and 1) resulting from the different runs of cross-validation. These data.frame need to have the same number of rows as in the full dataset (100% of the occurrences and 100% of the absences or background points) and a number of column equal to the number of cross-validation runs
`calib`	a logical matrix with a number of rows equal to the full dataset and a number of column corresponding to the number of cross-validation runs. The value TRUE is to mention the elements that where used to calibrate the models whereas FALSE corresponds to the one that will be used for the evaluation (NB the points used to calibrate the models during a cross-validation run should be the same across algoritms)
`resp`	a numeric vector where 1 corresponds to a species response and 0 to an absence (or background point) with a length corresponding to number of rows in calib
`AlgoName`	a character vector for giving a name to each elements of fit. If NULL, the position in the list will be used instead.
`metrics`	a vector of evaluation metrics chosen among "SomersD", "AUC", "MaxTSS", "MaxKappa", "Boyce"
`ensembleEvaluation`	logical. If TRUE, the ensemble model will be evaluated applying a weighted mean across algorithms.
`w`	a numeric vector of the weights used to realize the ensemble model. The length should match the number of algorithms.
`metricToEnsemble`	character. Metric to use to ensemble the models with a weighted mean when w is not given. The metric should be one in metrics

Details

Because a minimum sample size is needed to evaluate models (see Collart & Guisan,2023; Jiménez-Valverde, 2020), this function uses the approach from Collart et al.(2021), which consists to pool the suitability values of the hold-out data (evaluation dataset) across replicates. As the same data point (presence or absence or background point) is presumably sampled in several replicates, the suitability values for each data point is consequently averaged across replicates where they were sampled. This procedure generates a series of independent suitability values with a size approximately equal (as some data points may not have been sampled by chance in any of the n replicates) to that of the number of data point.

Value

a list containing:

`evaluations`	a matrix with the evaluation scores based on the different modelling algorithms and based on the consensus across the modelling algorithms (called here "ensemble")
`fit`	a matrix of predicted values resulting from the pooling procedure and used to compute the evaluation scores. The column resp is where the species occurs or not

Author(s)

Flavien Collart flavien.collart@unil.ch

with contributions of Olivier Broennimann olivier.broennimann@unil.ch

References

Collart, F., & Guisan, A. (2023). Small to train, small to test: Dealing with low sample size in model evaluation. Ecological Informatics. 75, 102106. doi:10.1016/j.ecoinf.2023.102106

Collart, F., Hedenäs, L., Broennimann, O., Guisan, A. and Vanderpoorten, A. 2021. Intraspecific differentiation: Implications for niche and distribution modelling. Journal of Biogeography. 48, 415-426. doi:10.1111/jbi.14009

Jiménez-Valverde, A. 2020. Sample size for the evaluation of presence-absence models. Ecological Indicators. 114, 106289. doi:10.1016/j.ecolind.2020.106289

Examples

  set.seed(42)
  resp <- c(rep(1,15),rep(0,85)) #15 presences and 85 absences
  #Generating a fake fit object whith two algorithms and 3 cross-vaidation
  fit <- list(matrix(0,nc=3,nr=100),
              matrix(0,nc=3,nr=100))
  fit[[1]][1:15,] = sample(seq(0,1, by=0.01),15*3,prob=c(rep(1,51),rep(10,50)),replace=TRUE)
  fit[[2]][1:15,] = sample(seq(0,1, by=0.01),15*3,prob=c(rep(1,51),rep(10,50)),replace=TRUE)
  fit[[1]][16:100,] = sample(seq(0,1, by=0.01),85*3,prob=c(rep(10,51),rep(1,50)),replace=TRUE)
  fit[[2]][16:100,] = sample(seq(0,1, by=0.01),85*3,prob=c(rep(10,51),rep(1,50)),replace=TRUE)
  
  # Generating a calib object where 80% of the dataset is used to calibrate the model 
  # and 20% to evaluate it
  calib <- matrix(TRUE,nc=3,nr=100)
  calib[c(sample(1:15,3),sample(16:100,17)),1]=FALSE 
  calib[c(sample(1:15,3),sample(16:100,17)),2]=FALSE 
  calib[c(sample(1:15,3),sample(16:100,17)),3]=FALSE 
  
  # Evaluation via the pooling procedure
  eval <- ecospat.poolingEvaluation(fit=fit,calib=calib,resp=resp,metrics=c("AUC","MaxTSS"))
  eval$evaluations
  
  # Evaluation including the ensemble model based on a weighted mean using MaxTSS
  evalEns <- ecospat.poolingEvaluation(fit=fit,calib=calib,resp=resp,ensembleEvaluation=TRUE,
                                       metrics=c("AUC","MaxTSS"))
  evalEns$evaluations
  
  # Evaluation including the ensemble model based on a mean by giving the same weight for 
  # each algorithm
  evalEns <- ecospat.poolingEvaluation(fit=fit,calib=calib,resp=resp,ensembleEvaluation=TRUE,
                                       metrics=c("AUC","MaxTSS"),w=c(1,1))
  evalEns$evaluations

[Package ecospat version 4.1.1 Index]