ecospat.CCV.modeling {ecospat} | R Documentation |
Runs indivudual species distribuion models with SDMs or ESMs
Description
Creates probabilistic prediction for all species based on SDMs or ESMs and returns their evaluation metrics and variable importances.
Usage
ecospat.CCV.modeling(sp.data,
env.data,
xy,
DataSplitTable=NULL,
DataSplit = 70,
NbRunEval = 25,
minNbPredictors =5,
validation.method = "cross-validation",
models.sdm = c("GLM","RF"),
models.esm = "CTA",
modeling.options.sdm = NULL,
modeling.options.esm = NULL,
ensemble.metric = "AUC",
ESM = "YES",
parallel = FALSE,
cpus = 4,
VarImport = 10,
modeling.id)
Arguments
sp.data |
a data.frame where the rows are sites and the columns are species (values 1,0) |
env.data |
either a data.frame where rows are sites and colums are environmental variables or a SpatRaster of the envrionmental variables |
xy |
two column data.frame with X and Y coordinates of the sites (most be same coordinate system as |
DataSplitTable |
a table providing |
DataSplit |
percentage of dataset observations retained for the model training (only needed if no |
NbRunEval |
number of cross-validatio/split sample runs (only needed if no |
minNbPredictors |
minimum number of occurences [min(presences/Absences] per predicotors needed to calibrate the models |
validation.method |
either "cross-validation" or "split-sample" used to validate the communtiy predictions (only needed if no |
models.sdm |
modeling techniques used for the normal SDMs. Vector of models names choosen among |
models.esm |
modeling techniques used for the ESMs. Vector of models names choosen among |
modeling.options.sdm |
modeling options for the normal SDMs. |
modeling.options.esm |
modeling options for the ESMs. |
ensemble.metric |
evaluation score used to weight single models to build ensembles: |
ESM |
either |
parallel |
should parallel computing be allowed ( |
cpus |
number of cpus to use in parallel computing |
VarImport |
number of permutation runs to evaluate variable importance |
modeling.id |
character, the ID (=name) of modeling procedure. A random number by default |
Details
The basic idea of the community cross-validation (CCV) is to use the same data (sites) for the model calibration/evaluation of all species. This ensures that there is "independent" cross-validation/split-sample data available not only at the individual species level but also at the community level. This is key to allow an unbiased estimation of the ability to predict species assemblages (Scherrer et al. 2018).
The output of the ecospat.CCV.modeling function can then be used to evaluate the species assemblage predictions with the ecospat.CCV.communityEvaluation.bin
or ecospat.CCV.communityEvaluation.prob
functions.
Value
modelling.id |
character, the ID (=name) of modeling procedure |
output.files |
vector with the names of the files written to the hard drive |
speciesData.calibration |
a 3-dimensional array of presence/absence data of all species for the calibration plots used for each run |
speciesData.evaluation |
a 3-dimensional array of presence/absence data of all species for the evaluation plots used for each run |
speciesData.full |
a data.frame of presence/absence data of all species (same as |
DataSplitTable |
a matrix with |
singleSpecies.ensembleEvaluationScore |
a 3-dimensional array of single species evaluation metrics ( |
singleSpecies.ensembleVariableImportance |
a 3-dimensional array of single species variable importance for all predictors |
singleSpecies.calibrationSites.ensemblePredictions |
a 3-dimensional array of the predictions for each species and run at the calibration sites |
singleSpecies.evaluationSites.ensemblePredictions |
a 3-dimensional array of the predictions for each species and run at the evaluation sites |
allSites.averagePredictions.cali |
a matrix with the average predicted probabilities for each site across all the runs the sites were used for model calibration |
allSites.averagePredictions.eval |
a matrix with the average predicted probabilities for each site across all the runs the sites were used as independent evaluation sites |
Author(s)
Daniel Scherrer <daniel.j.a.scherrer@gmail.com> with the updates from Flavien Collart and Olivier Broennimann
References
Scherrer, D., D'Amen, M., Mateo, M.R.G., Fernandes, R.F. & Guisan , A. (2018) How to best threshold and validate stacked species assemblages? Community optimisation might hold the answer. Methods in Ecology and Evolution, in review
See Also
ecospat.CCV.createDataSplitTable
; ecospat.CCV.communityEvaluation.bin
; ecospat.CCV.communityEvaluation.prob
Examples
#Loading species occurence data and remove empty communities
data(ecospat.testData)
testData <- ecospat.testData[,c(24,34,43,45,48,53,55:58)]
sp.data <- testData[which(rowSums(testData)>2), sort(colnames(testData))]
#Loading environmental data
env.data <- ecospat.testData[which(rowSums(testData)>2),4:8]
#Coordinates for all sites
xy <- ecospat.testData[which(rowSums(testData)>2),2:3]
#Running all the models for all species
myCCV.Models <- ecospat.CCV.modeling(sp.data = sp.data,
env.data = env.data,
xy = xy,
NbRunEval = 2,
minNbPredictors = 10,
VarImport = 3)
#Calculating the probabilistic community metrics
metrics = c('SR.deviation','community.AUC','probabilistic.Sorensen','Max.Sorensen')
myCCV.Eval.prob <- ecospat.CCV.communityEvaluation.prob(
ccv.modeling.data = myCCV.Models,
community.metrics = metrics)
#Thresholding all the predictions and calculating the community evaluation metrics
myCCV.communityEvaluation.bin <- ecospat.CCV.communityEvaluation.bin(
ccv.modeling.data = myCCV.Models,
thresholds = c('MAX.KAPPA', 'MAX.ROC','PS_SDM'),
community.metrics= c('SR.deviation','Sorensen'),
parallel = FALSE,
cpus = 4)
#removing files on disk
unlink(list.files(pattern=myCCV.Models$modeling.id))
unlink(myCCV.Models$modeling.id,recursive=TRUE)