brt_fit {dynamicSDM} | R Documentation |
Fit boosted regression tree models to species distribution or abundance data.
Description
Fit gradient boosting boosted regression tree models to species distribution and abundance data and associated dynamic explanatory variables.
Usage
brt_fit(
occ.data,
response.col,
varnames,
distribution,
block.col,
weights.col,
test.data,
interaction.depth,
n.trees = 5000,
shrinkage = 0.001
)
Arguments
occ.data |
a data frame, the data to fit boosted regression tree models to, containing
columns for model response and explanatory variable data. If required, |
response.col |
a character string, the name of the column in |
varnames |
a character vector, the names of the columns containing model explanatory
variables in |
distribution |
a character string, the model distribution family to use, such as |
block.col |
optional; a character string, the name of the column in |
weights.col |
a character string, the name of the column in |
test.data |
optional; a data frame, the testing dataset for optimising |
interaction.depth |
optional; an integer specifying the maximum depth of each tree (i.e. highest level of variable interactions allowed). Default optimises depth between 1 and 4. |
n.trees |
optional; an integer, the number of trees in boosted regression tree models. Default is 5000. |
shrinkage |
optional; an integer, the shrinkage parameter applied to each tree in the boosted regression tree expansion. Also known as the learning rate. Default is 0.001. |
Details
This function calculates a gradient boosting gbm
object for the response and
explanatory variable data provided, using the gbm
R package (Greenwell et al., 2019).
Key functionality for dynamic SDMs within brt_fit()
includes:
Optimise
interaction.depth
If interaction.depth
is not given, then brt_fit()
will vary the interaction depth parameter
between 1 (an additive model) and 4 (four-way interaction model). For each interaction.depth
value, model performance is measured by calculating the root-mean-square error of model
predictions compared to actual values in the testing data. The interaction.depth
value that
results in the lowest root-mean-square error is used when fitting the returned model.
The model testing dataset used can either be given using test.data
or block.col
(expanded on below).
Split by spatio-temporal blocks to account for spatial and temporal autocorrelation
If block.col
is specified, then each unique block is excluded in a jack-knife approach
following Bagchi et al., (2013). This approach uses each block as the model testing dataset in
numerical order, whilst all other block.col
blocks are used as training data for the boosted
regression tree model.
In this case, the function returns a list of fitted boosted regression tree models equal to the
length of unique blocking categories in block.col
.
If block.col
is not given, models are fit to all occ.data and a single gbm
model is
returned.
Weighted by spatio-temporal sampling effort
If weights.col
is specified, records are weighted by their associated value in this column
when model fitting. For instance, the user may wish to down weigh the importance of records
collected at oversampled sites and times when fitting models, and vice versa, to account for
spatio-temporal biases in occurrence records(Stolar and Nielsen, 2015) .
Value
Returns a gbm
model object or list of gbm
model objects.
References
Bagchi, R., Crosby, M., Huntley, B., Hole, D. G., Butchart, S. H. M., Collingham, Y., Kalra, M., Rajkumar, J., Rahmani, A. & Pandey, M. 2013. Evaluating the effectiveness of conservation site networks under climate change: accounting for uncertainty. Global Change Biology, 19, 1236-1248.
Greenwell, B., Boehmke, B., Cunningham, J., & GBM Developers. 2019. Package ‘gbm’. R package version, 2.
Stolar, J. & Nielsen, S. E. 2015. Accounting For Spatially Biased Sampling Effort In Presence-Only Species Distribution Modelling. Diversity And Distributions, 21, 595-608.
Examples
data("sample_explan_data")
split <- sample(c(TRUE, FALSE),
replace=TRUE,
nrow(sample_explan_data),
prob = c(0.75, 0.25))
training <- sample_explan_data[split, ]
testing <- sample_explan_data[!split, ]
brt_fit(
occ.data = training,
test.data = testing,
response.col = "presence.absence",
distribution = "bernoulli",
varnames = colnames(training)[14:16],
interaction.depth = 2
)