ensemble {sdm} | R Documentation |
Ensemble Forecasting of SDMs
Description
Make a Raster object with a weighted averaging over all predictions from several fitted model in a sdmModel object.
Usage
## S4 method for signature 'sdmModels'
ensemble(x, newdata, filename="",setting,overwrite=FALSE,pFilename="",...)
Arguments
x |
a sdmModels object |
newdata |
raster object or data.frame, can be either predictors or the results of the |
filename |
optional character, output file name (if newdata is raster object) |
setting |
list, contains the parameters that are used in the ensemble procedure; see details |
overwrite |
logical, whether existing filename is overwritten (if exists and filename is given) |
pFilename |
it is ignored if newdata is the output of |
... |
additional arguments pass to the |
Details
ensemble function uses the fitted models in an sdmModels
object to generate an ensemble/consensus of predictions by multiple individual models. Several ensemble
methods are available and can be defined in the setting argument.
A list of settings can be introduced in the setting
argument including:
- method
: a character vector specifies which ensemble method(s) should be employed (multiple choice is possible). The details about the available methods are provided at the end of this page.
- stat
: if the - method='weighted'
is used, it specifies which evaluation metrics can be used as weight in the weighted averaging procedure. Alternatively, one may directly introduce weights (see the next argument).
- weights
: an optional numeric vector (with a length equal to the models that are successfully fitted) to specify the weights for weighted averaging procedure (if the method='weighted' is specified).
- id
: specifies the model IDs that should be considered in the ensemble procedure. If missing, all the models that are successfully fitted are considered.
- expr
: A character or an expression specifies a condition to select models for the ensemble procedure. For example: expr='auc > 0.7'
only use models with AUC accuracy greater than 0.7. OR expr='auc > 0.7 & tss > 0.5'
subsets models based on both AUC and TSS metrics.
- wtest
: specifies which test dataset ("training","test.dep","test.indep") should be used to extract the statistic (stat) values as weights (if a relevant method is specified)
- opt
: if a thershold_based metric is used in is selected in stat
or in expr
, opt
specifies the threshold selection criterion. The possible value can be between 1 to 14 for "sp=se", "max(se+sp)", "min(cost)", "minROCdist", "max(kappa)", "max(ppv+npv)", "ppv=npv", "max(NMI)", "max(ccr)", "prevalence", "P10", "P5", "P1", "P0"
criteria, respectively.
- power
: default: 1, a numeric value to which the weights are raised. Greater value than 1 affects weighting scheme (for the methods e.g., "weighted") to increase the weights for the models with greater weight. For example, if weights are c(0.2,0.2,0.2,0.4), raising them to power 2 would be resulted to new weights as c(0.1428571,0.1428571, 0.1428571, 0.5714286) that causes greater contribution of the models with greater performances to the ensemble output.
—> The available ensemble methods (to be specified in method
) include:
– 'unweighted': unweighted averaging/mean.
– 'weighted': weighted averaging.
– 'median': median.
– 'pa': mean of predicted presence-absence values (predicted probabilities are first converted to presence-absence given a threshold (opt
defines which threshold optimisation strategy should be used), then they are averaged).
– 'mean-weighted': A two step averaging, that can be used when several replications are available for each modelling methods (e.g., fitted through bootstrapping or cross-validation resampling); it first takes an unweighted mean over the predicted values of multiple replications for each method (within model averaging), then a weighted mean is employed to combine the probabilities of different methods (between models averaging).
– 'mean-unweighted': Same as the previous one, but an unweighted mean is also used for the second step (instead of weighted mean).
– 'median-weighted': Same as the 'mean-weighted, but the median is used in the first step.
– 'median-unweighted': another two-step method, median is used for the first step and unweighted mean is used for the second step.
—-> in addition to tne ensemble methods, some other methods are available to generate some outputs that can represent uncertainty:
– 'uncertainty' or 'entropy': this method generates the uncertainty among the models' predictions that can be interpreted as model-based uncertainty or inconsistency among different models. It ranges between 0 and 1, 0 means all the models predicted the same value (either presence or absence), and 1 referes to maximum uncertainy, e.g., half of the models predicted presence (or absence) and the other half predicted the oposite value.
– 'cv': Coefficient of variation of probabilities generated from multiple models
– 'stdev': Standard deviation of probabilities generated from multiple models
– 'ci': This generates confidence interval length (marginal error) which assigns the difference between upper and lower limits of confidence interval to each pixel (upper - lower). The default level of confidence interval is 95% (i.e., alpha = 0.05
), unless a different alpha
is defined in setting
. In case two separate upper and lower rasters are needed, by using the following codes, the upper and lower limits can be calculated:
en <- ensemble(x, newdata, setting=list(method=c('mean','ci')))
# taking unweighted averaging and ci
# en[[1]] is the mean of all probabilities and en[[2]] is the ci
ci.upper <- en[[1]] + en[[2]] / 2
# adding marginal error (half of the generated ci) to mean
ci.lower <- en[[1]] - en[[2]] / 2
# subtracting marginal error from mean
plot(ci.upper,main='Upper limit of Confidence Interval - alpha = 0.05')
plot(ci.lower,main='Lower limit of Confidence Interval - alpha = 0.05')
Value
- a Raster object if predictors
is a Raster object
- a numeric vector (or a data.frame) if predictors
is a data.frame object
Author(s)
Babak Naimi naimi.b@gmail.com
https://www.biogeoinformatics.org/
References
Naimi, B., Araujo, M.B. (2016) sdm: a reproducible and extensible R platform for species distribution modelling, Ecography, 39:368-375, DOI: 10.1111/ecog.01881
See Also
#
Examples
## Not run:
file <- system.file("external/species.shp", package="sdm") # get the location of the species data
species <- vect(file) # read the shapefile
path <- system.file("external", package="sdm") # path to the folder contains the data
lst <- list.files(path=path,pattern='asc$',full.names = T) # list the name of the raster files
# stack is a function in the raster package, to read/create a multi-layers raster dataset
preds <- rast(lst) # making a raster object
d <- sdmData(formula=Occurrence~., train=species, predictors=preds)
d
# fit the models (5 methods, and 10 replications using bootstrapping procedure):
m <- sdm(Occurrence~.,data=d,methods=c('rf','tree','fda','mars','svm'),
replicatin='boot',n=10)
# ensemble using weighted averaging based on AUC statistic:
p1 <- ensemble(m, newdata=preds, filename='ens.img',setting=list(method='weighted',stat='AUC'))
plot(p1)
# ensemble using weighted averaging based on TSS statistic
# and optimum threshold critesion 2 (i.e., Max(spe+sen)) :
p2 <- ensemble(m, newdata=preds, filename='ens2.img',setting=list(method='weighted',
stat='TSS',opt=2))
plot(p2)
## End(Not run)