| bm_CrossValidation {biomod2} | R Documentation |
Build cross-validation table
Description
This internal biomod2 function allows to build a cross-validation table
according to 6 different methods : random, kfold, block, strat,
env or user.defined (see Details).
Usage
bm_CrossValidation(
bm.format,
strategy = "random",
nb.rep = 0,
perc = 0.8,
k = 0,
balance = "presences",
env.var = NULL,
strat = "both",
user.table = NULL,
do.full.models = FALSE
)
bm_CrossValidation_user.defined(bm.format, ...)
## S4 method for signature 'BIOMOD.formated.data'
bm_CrossValidation_user.defined(bm.format, user.table)
## S4 method for signature 'BIOMOD.formated.data.PA'
bm_CrossValidation_user.defined(bm.format, user.table)
bm_CrossValidation_random(bm.format, ...)
## S4 method for signature 'BIOMOD.formated.data'
bm_CrossValidation_random(bm.format, nb.rep, perc)
## S4 method for signature 'BIOMOD.formated.data.PA'
bm_CrossValidation_random(bm.format, nb.rep, perc)
bm_CrossValidation_kfold(bm.format, ...)
## S4 method for signature 'BIOMOD.formated.data'
bm_CrossValidation_kfold(bm.format, nb.rep, k)
## S4 method for signature 'BIOMOD.formated.data.PA'
bm_CrossValidation_kfold(bm.format, nb.rep, k)
bm_CrossValidation_block(bm.format, ...)
## S4 method for signature 'BIOMOD.formated.data'
bm_CrossValidation_block(bm.format)
## S4 method for signature 'BIOMOD.formated.data.PA'
bm_CrossValidation_block(bm.format)
bm_CrossValidation_strat(bm.format, ...)
## S4 method for signature 'BIOMOD.formated.data'
bm_CrossValidation_strat(bm.format, balance, strat, k)
## S4 method for signature 'BIOMOD.formated.data.PA'
bm_CrossValidation_strat(bm.format, balance, strat, k)
bm_CrossValidation_env(bm.format, ...)
## S4 method for signature 'BIOMOD.formated.data'
bm_CrossValidation_env(bm.format, balance, k, env.var)
## S4 method for signature 'BIOMOD.formated.data.PA'
bm_CrossValidation_env(bm.format, balance, k, env.var)
Arguments
bm.format |
a |
strategy |
a |
nb.rep |
(optional, default |
perc |
(optional, default |
k |
(optional, default |
balance |
(optional, default |
env.var |
(optional) |
strat |
(optional, default |
user.table |
(optional, default |
do.full.models |
(optional, default |
... |
(optional, one or several of the following arguments depending on the selected method) |
Details
Several parameters are available within the function and some of them can be used with different cross-validation strategies :
| ....... | random | kfold | block | strat | env |
__________________________________________________
| nb.rep. | x..... | x.... | ..... | ..... | ... |
| perc... | x..... | ..... | ..... | ..... | ... |
| k...... | ...... | x.... | ..... | x.... | x.. |
| balance | ...... | ..... | ..... | x.... | x.. |
| strat.. | ...... | ..... | ..... | x.... | ... |
Concerning column names of matrix output :
The number of columns depends on the strategy selected.
The column names are given a posteriori of the selection, ranging from 1 to the
number of columns.
If do.full.models = TRUE, columns merging runs (and/or pseudo-absence datasets)
are added at the end.
Concerning cross-validation strategies :
- random
Most simple method to calibrate and validate a model is to split the original dataset in two datasets : one to calibrate the model and the other one to validate it. The splitting can be repeated
nb.reptimes.- k-fold
The k-fold method splits the original dataset in
kdatasets of equal sizes : each part is used successively as the validation dataset while the otherk-1parts are used for the calibration, leading tokcalibration/validation ensembles. This multiple splitting can be repeatednb.reptimes.- block
It may be used to test for model overfitting and to assess transferability in geographic space.
blockstratification was described in Muscarella et al. 2014 (see References). Four bins of equal size are partitioned (bottom-left, bottom-right, top-left and top-right).- stratified
It may be used to test for model overfitting and to assess transferability in geographic space.
xandystratification was described in Wenger and Olden 2012 (see References).ystratification useskpartitions along the y-gradient,xstratification does the same for the x-gradient.bothreturns2kpartitions:kpartitions stratified along the x-gradient andkpartitions stratified along the y-gradient.- environmental
It may be used to test for model overfitting and to assess transferability in environmental space. It returns
kpartitions for each variable given inenv.var.- user-defined
Allow the user to give its own crossvalidation table. For a presence-absence dataset, column names must be formatted as:
_allData_RUNxwithxan integer. For a presence-only dataset for which several pseudo-absence dataset were generated, column names must be formatted as:_PAx_RUNywithxan integer andPAxan existing pseudo-absence dataset andyan integer
Concerning balance parameter :
If balance = 'presences', presences are divided (balanced) equally over the partitions
(e.g. Fig. 1b in Muscarelly et al. 2014).
Absences or pseudo-absences will however be unbalanced over the partitions especially if the
presences are clumped on an edge of the study area.
If balance = 'absences', absences (resp. pseudo-absences or background) are divided
(balanced) as equally as possible between the partitions (geographical balanced bins given
that absences are spread over the study area equally, approach similar to Fig. 1 in
Wenger et Olden 2012). Presences will however be unbalanced over the partitions especially
if the presences are clumped on an edge of the study area.
Value
A matrix or data.frame defining for each repetition (in columns) which
observation lines should be used for models calibration (TRUE) and validation
(FALSE).
Author(s)
Frank Breiner, Maya Gueguen
References
Muscarella, R., Galante, P.J., Soley-Guardia, M., Boria, R.A., Kass, J.M., Uriarte, M. & Anderson, R.P. (2014). ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models. Methods in Ecology and Evolution, 5, 1198-1205.
Wenger, S.J. & Olden, J.D. (2012). Assessing transferability of ecological models: an underappreciated aspect of statistical validation. Methods in Ecology and Evolution, 3, 260-267.
See Also
get.block, kfold,
BIOMOD_FormatingData, BIOMOD_Modeling
Other Secundary functions:
bm_BinaryTransformation(),
bm_FindOptimStat(),
bm_MakeFormula(),
bm_ModelingOptions(),
bm_PlotEvalBoxplot(),
bm_PlotEvalMean(),
bm_PlotRangeSize(),
bm_PlotResponseCurves(),
bm_PlotVarImpBoxplot(),
bm_PseudoAbsences(),
bm_RunModelsLoop(),
bm_SRE(),
bm_SampleBinaryVector(),
bm_SampleFactorLevels(),
bm_Tuning(),
bm_VariablesImportance()
Examples
library(terra)
# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)
# Select the name of the studied species
myRespName <- 'GuloGulo'
# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)
# --------------------------------------------------------------- #
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
expl.var = myExpl,
resp.xy = myRespXY,
resp.name = myRespName)
# --------------------------------------------------------------- #
# Create the different validation datasets
# random selection
cv.r <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "random",
nb.rep = 3,
k = 0.8)
# k-fold selection
cv.k <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "kfold",
nb.rep = 2,
k = 3)
# block selection
cv.b <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "block")
# stratified selection (geographic)
cv.s <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "strat",
k = 2,
balance = "presences",
strat = "x")
# stratified selection (environmental)
cv.e <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "env",
k = 2,
balance = "presences")
head(cv.r)
apply(cv.r, 2, table)
head(cv.k)
apply(cv.k, 2, table)
head(cv.b)
apply(cv.b, 2, table)
head(cv.s)
apply(cv.s, 2, table)
head(cv.e)
apply(cv.e, 2, table)