BIOMOD_Modeling {biomod2} | R Documentation |
Run a range of species distribution models
Description
This function allows to calibrate and evaluate a range of modeling techniques for a given species distribution. The dataset can be split up in calibration/validation parts, and the predictive power of the different models can be estimated using a range of evaluation metrics (see Details).
Usage
BIOMOD_Modeling(
bm.format,
modeling.id = as.character(format(Sys.time(), "%s")),
models = c("GLM", "GBM", "GAM", "CTA", "ANN", "SRE", "FDA", "MARS", "RF", "MAXENT",
"MAXNET", "XGBOOST"),
models.pa = NULL,
bm.options = NULL,
CV.strategy = "random",
CV.nb.rep = 1,
CV.perc = NULL,
CV.k = NULL,
CV.balance = NULL,
CV.env.var = NULL,
CV.strat = NULL,
CV.user.table = NULL,
CV.do.full.models = FALSE,
nb.rep,
data.split.perc,
data.split.table,
do.full.models,
weights = NULL,
prevalence = NULL,
metric.eval = c("KAPPA", "TSS", "ROC"),
var.import = 0,
scale.models = FALSE,
nb.cpu = 1,
seed.val = NULL,
do.progress = TRUE
)
Arguments
bm.format |
a |
modeling.id |
a |
models |
a |
models.pa |
(optional, default |
bm.options |
a |
CV.strategy |
a |
CV.nb.rep |
(optional, default |
CV.perc |
(optional, default |
CV.k |
(optional, default |
CV.balance |
(optional, default |
CV.env.var |
(optional) |
CV.strat |
(optional, default |
CV.user.table |
(optional, default |
CV.do.full.models |
(optional, default |
nb.rep |
deprecated, now called |
data.split.perc |
deprecated, now called |
data.split.table |
deprecated, now called |
do.full.models |
deprecated, now called |
weights |
(optional, default |
prevalence |
(optional, default |
metric.eval |
a |
var.import |
(optional, default |
scale.models |
(optional, default |
nb.cpu |
(optional, default |
seed.val |
(optional, default |
do.progress |
(optional, default |
Details
- bm.format
If you have decided to add pseudo absences to your original dataset (see
BIOMOD_FormatingData
),
PA.nb.rep *(nb.rep + 1)
models will be created.- models
The set of models to be calibrated on the data. 10 modeling techniques are currently available :
-
GLM
: Generalized Linear Model (glm
) -
GAM
: Generalized Additive Model (gam
,gam
orbam
)
(seeBIOMOD_ModelingOptions for details on algorithm selection
) -
GBM
: Generalized Boosting Model, or usually called Boosted Regression Trees (gbm
) -
CTA
: Classification Tree Analysis (rpart
) -
ANN
: Artificial Neural Network (nnet
) -
SRE
: Surface Range Envelop or usually called BIOCLIM -
FDA
: Flexible Discriminant Analysis (fda
) -
MARS
: Multiple Adaptive Regression Splines (earth
) -
RF
: Random Forest (randomForest
) -
MAXENT
: Maximum Entropy (https://biodiversityinformatics.amnh.org/open_source/maxent/) -
MAXNET
: Maximum Entropy (maxnet
) -
XGBOOST
: eXtreme Gradient Boosting Training (xgboost
)
-
- models.pa
Different models might respond differently to different numbers of pseudo-absences. It is possible to create sets of pseudo-absences with different numbers of points (see
BIOMOD_FormatingData
) and to assign only some of these datasets to each single model.- CV.[...] parameters
Different methods are available to calibrate/validate the single models (see
bm_CrossValidation
.)- weights & prevalence
More or less weight can be given to some specific observations.
If
weights = prevalence = NULL
, each observation (presence or absence) will have the same weight, no matter the total number of presences and absences.If
prevalence = 0.5
, presences and absences will be weighted equally (i.e. the weighted sum of presences equals the weighted sum of absences).If
prevalence
is set below (above)0.5
, more weight will be given to absences (presences).If
weights
is defined,prevalence
argument will be ignored, and each observation will have its own weight.If pseudo-absences have been generated (
PA.nb.rep > 0
inBIOMOD_FormatingData
), weights are by default calculated such thatprevalence = 0.5
. Automatically createdweights
will beinteger
values to prevent some modeling issues.
- metric.eval
-
-
ROC
: Relative Operating Characteristic -
KAPPA
: Cohen's Kappa (Heidke skill score) -
TSS
: True kill statistic (Hanssen and Kuipers discriminant, Peirce's skill score) -
FAR
: False alarm ratio -
SR
: Success ratio -
ACCURANCY
: Accuracy (fraction correct) -
BIAS
: Bias score (frequency bias) -
POD
: Probability of detection (hit rate) -
CSI
: Critical success index (threat score) -
ETS
: Equitable threat score (Gilbert skill score)
Optimal value of each method can be obtained with the
get_optim_value
function. Several evaluation metrics can be selected. Please refer to the CAWRC website (section "Methods for dichotomous forecasts") to get detailed description of each metric. -
- scale.models
This parameter is quite experimental and it is recommended not to use it. It may lead to reduction in projection scale amplitude. Some categorical models always have to be scaled (
FDA
,ANN
), but it may be interesting to scale all computed models to ensure comparable predictions (0-1000
range). It might be particularly useful when doing ensemble forecasting to remove the scale prediction effect (the more extended projections are, the more they influence ensemble forecasting results).
Value
A BIOMOD.models.out
object containing models outputs, or links to saved outputs.
Models outputs are stored out of R (for memory storage reasons) in 2 different folders
created in the current working directory :
a models folder, named after the
resp.name
argument ofBIOMOD_FormatingData
, and containing all calibrated models for each repetition and pseudo-absence runa hidden folder, named
.BIOMOD_DATA
, and containing outputs related files (original dataset, calibration lines, pseudo-absences selected, predictions, variables importance, evaluation values...), that can be retrieved withget_[...]
orload
functions, and used by other biomod2 functions, likeBIOMOD_Projection
orBIOMOD_EnsembleModeling
Author(s)
Wilfried Thuiller, Damien Georges, Robin Engler
See Also
glm
, gam
,
gam
, bam
, gbm
,
rpart
, codennet,
fda
, earth
,
randomForest
, maxnet
,
xgboost
, BIOMOD_FormatingData
,
BIOMOD_ModelingOptions
, bm_CrossValidation
,
bm_VariablesImportance
, BIOMOD_Projection
,
BIOMOD_EnsembleModeling
, bm_PlotEvalMean
,
bm_PlotEvalBoxplot
, bm_PlotVarImpBoxplot
,
bm_PlotResponseCurves
Other Main functions:
BIOMOD_EnsembleForecasting()
,
BIOMOD_EnsembleModeling()
,
BIOMOD_FormatingData()
,
BIOMOD_LoadModels()
,
BIOMOD_ModelingOptions()
,
BIOMOD_PresenceOnly()
,
BIOMOD_Projection()
,
BIOMOD_RangeSize()
,
BIOMOD_Tuning()
Examples
library(terra)
# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)
# Select the name of the studied species
myRespName <- 'GuloGulo'
# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)
# ---------------------------------------------------------------------------- #
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
expl.var = myExpl,
resp.xy = myRespXY,
resp.name = myRespName)
# Create default modeling options
myBiomodOptions <- BIOMOD_ModelingOptions()
# ---------------------------------------------------------------------------- #
# Model single models
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'AllModels',
models = c('RF', 'GLM'),
bm.options = myBiomodOptions,
CV.strategy = 'random',
CV.nb.rep = 2,
CV.perc = 0.8,
metric.eval = c('TSS','ROC'),
var.import = 2,
seed.val = 42)
myBiomodModelOut
# Get evaluation scores & variables importance
get_evaluations(myBiomodModelOut)
get_variables_importance(myBiomodModelOut)
# Represent evaluation scores
bm_PlotEvalMean(bm.out = myBiomodModelOut, dataset = 'calibration')
bm_PlotEvalMean(bm.out = myBiomodModelOut, dataset = 'validation')
bm_PlotEvalBoxplot(bm.out = myBiomodModelOut, group.by = c('algo', 'run'))
# # Represent variables importance
# bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('expl.var', 'algo', 'algo'))
# bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('expl.var', 'algo', 'dataset'))
# bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('algo', 'expl.var', 'dataset'))
# # Represent response curves
# mods <- get_built_models(myBiomodModelOut, run = 'RUN1')
# bm_PlotResponseCurves(bm.out = myBiomodModelOut,
# models.chosen = mods,
# fixed.var = 'median')
# bm_PlotResponseCurves(bm.out = myBiomodModelOut,
# models.chosen = mods,
# fixed.var = 'min')
# mods <- get_built_models(myBiomodModelOut, full.name = 'GuloGulo_allData_RUN2_RF')
# bm_PlotResponseCurves(bm.out = myBiomodModelOut,
# models.chosen = mods,
# fixed.var = 'median',
# do.bivariate = TRUE)