R: Fit boosted regression tree models to species distribution or...

brt_fit {dynamicSDM}

R Documentation

Fit boosted regression tree models to species distribution or abundance data.

Description

Fit gradient boosting boosted regression tree models to species distribution and abundance data and associated dynamic explanatory variables.

Usage

brt_fit(
  occ.data,
  response.col,
  varnames,
  distribution,
  block.col,
  weights.col,
  test.data,
  interaction.depth,
  n.trees = 5000,
  shrinkage = 0.001
)

Arguments

`occ.data`	a data frame, the data to fit boosted regression tree models to, containing columns for model response and explanatory variable data. If required, `occ.data` should contain `block.col` and `weights.col` columns too.
`response.col`	a character string, the name of the column in `occ.data` containing response variable column.
`varnames`	a character vector, the names of the columns containing model explanatory variables in `occ.data.`
`distribution`	a character string, the model distribution family to use, such as `gaussian`, `poisson` or `bernoulli`.
`block.col`	optional; a character string, the name of the column in `occ.data` containing spatio-temporal block numbers for `occ.data` splitting. See details for more information.
`weights.col`	a character string, the name of the column in `occ.data` containing spatio-temporal sampling effort weights to be used in the model fitting process.
`test.data`	optional; a data frame, the testing dataset for optimising `interaction.depth` when blocking is not used.
`interaction.depth`	optional; an integer specifying the maximum depth of each tree (i.e. highest level of variable interactions allowed). Default optimises depth between 1 and 4.
`n.trees`	optional; an integer, the number of trees in boosted regression tree models. Default is 5000.
`shrinkage`	optional; an integer, the shrinkage parameter applied to each tree in the boosted regression tree expansion. Also known as the learning rate. Default is 0.001.

Details

This function calculates a gradient boosting gbm object for the response and explanatory variable data provided, using the gbm R package (Greenwell et al., 2019).

Key functionality for dynamic SDMs within brt_fit() includes:

Optimise interaction.depth

If interaction.depth is not given, then brt_fit() will vary the interaction depth parameter between 1 (an additive model) and 4 (four-way interaction model). For each interaction.depth value, model performance is measured by calculating the root-mean-square error of model predictions compared to actual values in the testing data. The interaction.depth value that results in the lowest root-mean-square error is used when fitting the returned model.

The model testing dataset used can either be given using test.data or block.col (expanded on below).

Split by spatio-temporal blocks to account for spatial and temporal autocorrelation

If block.col is specified, then each unique block is excluded in a jack-knife approach following Bagchi et al., (2013). This approach uses each block as the model testing dataset in numerical order, whilst all other block.col blocks are used as training data for the boosted regression tree model.

In this case, the function returns a list of fitted boosted regression tree models equal to the length of unique blocking categories in block.col.

If block.col is not given, models are fit to all occ.data and a single gbm model is returned.

Weighted by spatio-temporal sampling effort

If weights.col is specified, records are weighted by their associated value in this column when model fitting. For instance, the user may wish to down weigh the importance of records collected at oversampled sites and times when fitting models, and vice versa, to account for spatio-temporal biases in occurrence records(Stolar and Nielsen, 2015) .

Value

Returns a gbm model object or list of gbm model objects.

References

Bagchi, R., Crosby, M., Huntley, B., Hole, D. G., Butchart, S. H. M., Collingham, Y., Kalra, M., Rajkumar, J., Rahmani, A. & Pandey, M. 2013. Evaluating the effectiveness of conservation site networks under climate change: accounting for uncertainty. Global Change Biology, 19, 1236-1248.

Greenwell, B., Boehmke, B., Cunningham, J., & GBM Developers. 2019. Package ‘gbm’. R package version, 2.

Stolar, J. & Nielsen, S. E. 2015. Accounting For Spatially Biased Sampling Effort In Presence-Only Species Distribution Modelling. Diversity And Distributions, 21, 595-608.

Examples


data("sample_explan_data")

split <- sample(c(TRUE, FALSE),
               replace=TRUE,
               nrow(sample_explan_data),
               prob = c(0.75, 0.25))

training <- sample_explan_data[split, ]
testing <- sample_explan_data[!split, ]

brt_fit(
 occ.data = training,
 test.data = testing,
 response.col = "presence.absence",
 distribution = "bernoulli",
 varnames = colnames(training)[14:16],
 interaction.depth = 2
)

[Package dynamicSDM version 1.3.4 Index]