R: Suitability mapping based on ensembles of modelling...

ensemble.raster {BiodiversityR}

R Documentation

Suitability mapping based on ensembles of modelling algorithms: consensus mapping

Description

The basic function ensemble.raster creates two consensus raster layers, one based on a (weighted) average of different suitability modelling algorithms, and a second one documenting the number of modelling algorithms that predict presence of the focal organisms. Modelling algorithms include maximum entropy (MAXENT), boosted regression trees, random forests, generalized linear models (including stepwise selection of explanatory variables), generalized additive models (including stepwise selection of explanatory variables), multivariate adaptive regression splines, regression trees, artificial neural networks, flexible discriminant analysis, support vector machines, the BIOCLIM algorithm, the DOMAIN algorithm and the Mahalonobis algorithm. These sets of functions were developed in parallel with the biomod2 package, especially for inclusion of the maximum entropy algorithm, but also to allow for a more direct integration with the BiodiversityR package, more direct handling of model formulae and greater focus on mapping. Researchers and students of species distribution are strongly encouraged to familiarize themselves with all the options of the biomod2 and dismo packages.

Usage

ensemble.raster(xn = NULL, 
    models.list = NULL, 
    input.weights = models.list$output.weights,
    thresholds = models.list$thresholds,
    RASTER.species.name = models.list$species.name, 
    RASTER.stack.name = xn@title, 
    RASTER.format = "GTiff", RASTER.datatype = "INT2S", RASTER.NAflag = -32767,
    RASTER.models.overwrite = TRUE,
    evaluate = FALSE, SINK = FALSE,
    p = models.list$p, a = models.list$a,
    pt = models.list$pt, at = models.list$at,
    CATCH.OFF = FALSE)

ensemble.habitat.change(base.map = file.choose(), 
    other.maps = utils::choose.files(),
    change.folder = "ensembles/change",
    RASTER.names = "changes",
    RASTER.format = "GTiff", RASTER.datatype = "INT1U", RASTER.NAflag = 255)

ensemble.area(x=NULL, km2=TRUE)

Arguments

`xn`	RasterStack object (`stack`) containing all layers that correspond to explanatory variables of an ensemble calibrated earlier with `ensemble.calibrate.models`. See also `predict`.
`models.list`	list with 'old' model objects such as `MAXENT` or `RF`.
`input.weights`	array with numeric values for the different modelling algorithms; if `NULL` then values provided by parameters such as `MAXENT` and `GBM` will be used. As an alternative, the output from `ensemble.calibrate.weights` can be used.
`thresholds`	array with the threshold values separating predicted presence for each of the algorithms.
`RASTER.species.name`	First part of the names of the raster files that will be generated, expected to identify the modelled species (or organism).
`RASTER.stack.name`	Last part of the names of the raster files that will be generated, expected to identify the predictor stack used.
`RASTER.format`	Format of the raster files that will be generated. See `writeFormats` and `writeRaster`.
`RASTER.datatype`	Format of the raster files that will be generated. See `dataType` and `writeRaster`.
`RASTER.NAflag`	Value that is used to store missing data. See `writeRaster`.
`RASTER.models.overwrite`	Overwrite the raster files that correspond to each suitability modelling algorithm (if `TRUE`). (Overwriting actually implies that raster files are created or overwritten that start with "working_").
`evaluate`	if `TRUE`, then evaluate the created raster layers at locations `p`, `a`, `pt` and `at` (if provided). See also `evaluate`
`SINK`	Append the results to a text file in subfolder 'outputs' (if `TRUE`). The name of file is based on argument `RASTER.species.name`. In case the file already exists, then results are appended. See also `sink`.
`p`	presence points used for calibrating the suitability models, typically available in 2-column (x, y) or (lon, lat) dataframe; see also `prepareData` and `extract`
`a`	background points used for calibrating the suitability models, typically available in 2-column (x, y) or (lon, lat) dataframe; see also `prepareData` and `extract`
`pt`	presence points used for evaluating the suitability models, typically available in 2-column (lon, lat) dataframe; see also `prepareData`
`at`	background points used for calibrating the suitability models, typicall available in 2-column (lon, lat) dataframe; see also `prepareData` and `extract`
`CATCH.OFF`	Disable calls to function `tryCatch`.
`base.map`	filename with baseline map used to produce maps that show changes in suitable habitat
`other.maps`	files with other maps used to produce maps that show changes in suitable habitat from a defined base.map
`change.folder`	folder where new maps documenting changes in suitable habitat will be stored. NOTE: please ensure that the base folder (eg: ../ensembles) exists already.
`RASTER.names`	names for the files in the change.folder (previously set as names of the other maps).
`x`	RasterLayer object (`raster`) in a longitude-latitude coordinate system
`km2`	Provide results in square km rather than square m. See also `areaPolygon`

Details

The basic function ensemble.raster fits individual suitability models for all models with positive input weights. In subfolder "models" of the working directory, suitability maps for the individual suitability modelling algorithms are stored. In subfolder "ensembles", a consensus suitability map based on a weighted average of individual suitability models is stored. In subfolder "ensembles/presence", a presence-absence (1-0) map will be provided. In subfolder "ensembles/count", a consensus suitability map based on the number of individual suitability models that predict presence of the focal organism is stored.

Note that values in suitability maps are integer values that were calculated by multiplying probabilities by 1000 (see also trunc).

The ensemble.habitat.change function produces new raster layers that show changes in suitable and not suitable habitat between a base raster and a list of other rasters. The output uses the following coding: 0 = areas that remain unsuitable, 11 = areas that remain suitable, 10 = areas of lost habitat, 1 = areas of new habitat. (Codes are inspired on a binary classification of habitat suitability in base [1- or 0-] and other layer [-1 or -0], eg new habitat is coded 01 = 1).

The ensemble.area function calculates the area of different categories with areaPolygon

Value

The basic function ensemble.raster mainly results in the creation of raster layers that correspond to fitted probabilities of presence of individual suitability models (in folder "models") and consensus models (in folder "ensembles"), and the number of suitability models that predict presence (in folder "ensembles"). Prediction of presence is based on a threshold usually defined by maximizing the sum of the true presence and true absence rates (see threshold.method and also ModelEvaluation).

If desired by the user, the ensemble.raster function also saves details of fitted suitability models or data that can be plotted with the evaluation.strip.plot function.

Author(s)

Roeland Kindt (World Agroforestry Centre), Eike Luedeling (World Agroforestry Centre) and Evert Thomas (Bioversity International)

References

Kindt R. 2018. Ensemble species distribution modelling with transformed suitability values. Environmental Modelling & Software 100: 136-145. doi:10.1016/j.envsoft.2017.11.009

Buisson L, Thuiller W, Casajus N, Lek S and Grenouillet G. 2010. Uncertainty in ensemble forecasting of species distribution. Global Change Biology 16: 1145-1157

Examples

## Not run: 
# based on examples in the dismo package

# get predictor variables
library(dismo)
predictor.files <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''),
    pattern='grd', full.names=TRUE)
predictors <- stack(predictor.files)
# subset based on Variance Inflation Factors
predictors <- subset(predictors, subset=c("bio5", "bio6", 
    "bio16", "bio17"))
predictors
predictors@title <- "base"

# presence points
# presence points
presence_file <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='')
pres <- read.table(presence_file, header=TRUE, sep=',')[,-1]

# choose background points
background <- randomPoints(predictors, n=1000, extf = 1.00)

# if desired, change working directory where subfolders of "models" and 
# "ensembles" will be created
# raster layers will be saved in subfolders of /models and /ensembles:
getwd()

# first calibrate the ensemble
# calibration is done in two steps
# in step 1, a k-fold procedure is used to determine the weights
# in step 2, models are calibrated for all presence and background locations
# factor is not used as it is not certain whether correct levels will be used
# it may therefore be better to use dummy variables

# step 1: determine weights through 4-fold cross-validation
ensemble.calibrate.step1 <- ensemble.calibrate.weights(
    x=predictors, p=pres, a=background, k=4, 
    SINK=TRUE, species.name="Bradypus",
    MAXENT=0, MAXNET=1, MAXLIKE=1, GBM=1, GBMSTEP=0, RF=1, CF=1,
    GLM=1, GLMSTEP=1, GAM=1, GAMSTEP=1, MGCV=1, MGCVFIX=1, 
    EARTH=1, RPART=1, NNET=1, FDA=1, SVM=1, SVME=1, GLMNET=1,
    BIOCLIM.O=1, BIOCLIM=1, DOMAIN=1, MAHAL=0, MAHAL01=1,
    ENSEMBLE.tune=TRUE, PROBIT=TRUE,
    ENSEMBLE.best=0, ENSEMBLE.exponent=c(1, 2, 3),
    ENSEMBLE.min=c(0.65, 0.7),
    Yweights="BIOMOD",
    PLOTS=FALSE, formulae.defaults=TRUE)

# step 1 generated the weights for each algorithm
model.weights <- ensemble.calibrate.step1$output.weights
x.batch <- ensemble.calibrate.step1$x
p.batch <- ensemble.calibrate.step1$p
a.batch <- ensemble.calibrate.step1$a
MAXENT.a.batch <- ensemble.calibrate.step1$MAXENT.a
factors.batch <- ensemble.calibrate.step1$factors
dummy.vars.batch <- ensemble.calibrate.step1$dummy.vars

# step 2: calibrate models with all available presence locations
# weights determined in step 1 calculate ensemble in step 2
ensemble.calibrate.step2 <- ensemble.calibrate.models(
    x=x.batch, p=p.batch, a=a.batch, MAXENT.a=MAXENT.a.batch, 
    factors=factors.batch, dummy.vars=dummy.vars.batch, 
    SINK=TRUE, species.name="Bradypus",
    models.keep=TRUE,
    input.weights=model.weights,
    ENSEMBLE.tune=FALSE, PROBIT=TRUE,
    Yweights="BIOMOD",
    PLOTS=FALSE, formulae.defaults=TRUE)

# step 3: use previously calibrated models to create ensemble raster layers
# re-evaluate the created maps at presence and background locations
# (note that re-evaluation will be different due to truncation of raster layers
# as they wered saved as integer values ranged 0 to 1000)
ensemble.raster.results <- ensemble.raster(xn=predictors, 
    models.list=ensemble.calibrate.step2$models, 
    input.weights=model.weights,
    SINK=TRUE, evaluate=TRUE,
    RASTER.species.name="Bradypus", RASTER.stack.name="base")

# use the base map to check for changes in suitable habitat
# this type of analysis is typically done with different predictor layers
# (for example, predictor layers representing different possible future climates)
# In this example, changes from a previous model (ensemble.raster.results)
# are contrasted with a newly calibrated model (ensemble.raster.results2)
# step 1: 4-fold cross-validation
ensemble.calibrate2.step1 <- ensemble.calibrate.weights(
    x=x.batch, p=p.batch, a=a.batch, MAXENT.a=MAXENT.a.batch, 
    factors=factors.batch, dummy.vars=dummy.vars.batch, 
    k=4, 
    SINK=TRUE, species.name="Bradypus",
    MAXENT=0, MAXNET=1, MAXLIKE=1, GBM=1, GBMSTEP=0, RF=1, CF=1,
    GLM=1, GLMSTEP=1, GAM=1, GAMSTEP=1, MGCV=1, MGCVFIX=1, 
    EARTH=1, RPART=1, NNET=1, FDA=1, SVM=1, SVME=1, GLMNET=1,
    BIOCLIM.O=1, BIOCLIM=1, DOMAIN=1, MAHAL=0, MAHAL01=1,
    ENSEMBLE.tune=TRUE, PROBIT=TRUE,
    ENSEMBLE.best=0, ENSEMBLE.exponent=c(1, 2, 3),
    ENSEMBLE.min=c(0.65, 0.7),
    Yweights="BIOMOD",
    PLOTS=FALSE, formulae.defaults=TRUE)

model.weights2 <- ensemble.calibrate2.step1$output.weights

ensemble.calibrate2.step2 <- ensemble.calibrate.models(
    x=x.batch, p=p.batch, a=a.batch, MAXENT.a=MAXENT.a.batch, 
    factors=factors.batch, dummy.vars=dummy.vars.batch, 
    SINK=TRUE, species.name="Bradypus",
    models.keep=TRUE,
    input.weights=model.weights2,
    ENSEMBLE.tune=FALSE, PROBIT=TRUE,
    Yweights="BIOMOD",
    PLOTS=FALSE, formulae.defaults=TRUE)

ensemble.raster.results2 <- ensemble.raster(
    xn=predictors, 
    models.list=ensemble.calibrate2.step2$models, 
    input.weights=model.weights2,
    SINK=TRUE, evaluate=TRUE,
    RASTER.species.name="Bradypus", RASTER.stack.name="recalibrated")

base.file <- paste(getwd(), "/ensembles/presence/Bradypus_base.tif", sep="")
other.file <- paste(getwd(), "/ensembles/presence/Bradypus_recalibrated.tif", sep="")

changed.habitat <- ensemble.habitat.change(base.map=base.file, 
    other.maps=c(other.file),
    change.folder="ensembles/change",
    RASTER.names="Bradypus_recalibrated")

change.file <- paste(getwd(), "/ensembles/change/Bradypus_recalibrated.tif", sep="")

par.old <- graphics::par(no.readonly=T)
dev.new()
par(mfrow=c(2,2))
raster::plot(raster(base.file), breaks=c(-1, 0, 1), col=c("grey", "green"), 
    legend.shrink=0.8, main="base presence")
raster::plot(raster(other.file), breaks=c(-1, 0, 1), col=c("grey", "green"), 
    legend.shrink=0.8, main="other presence")
raster::plot(raster(change.file), breaks=c(-1, 0, 1, 10, 11), 
    col=c("grey", "blue", "red", "green"), 
    legend.shrink=0.8, main="habitat change", sub="11 remaining, 10 lost, 1 new")
graphics::par(par.old)

areas <- ensemble.area(raster(change.file))
areas

## End(Not run)

[Package BiodiversityR version 2.16-1 Index]