trainESM {enmSdmX}R Documentation

Calibrate an ensemble of small models

Description

This function calibrates a set of "ensembles of small models" (ESM), which are designed for modeling species with few occurrence records. In the original formulation, each model has two covariates interacting additively. Models are calibrated using all possible combinations of covariates. By default, this function does the same, but can also include univariate models, models with two covariates plus their interaction term, and models with quadratic and corresponding linear terms. This function will only train generalized linear models. Extending the types of algorithms is planned!

Usage

trainESM(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  univariate = FALSE,
  quadratic = FALSE,
  interaction = FALSE,
  interceptOnly = FALSE,
  method = "glm.fit",
  scale = NA,
  w = TRUE,
  family = stats::binomial(),
  ...,
  verbose = FALSE
)

Arguments

data

Data frame or matrix. Response variable and environmental predictors (and no other fields) for presences and non-presence sites.

resp

Character or integer. Name or column index of response variable. Default is to use the first column in data.

preds

Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in data as predictors.

univariate, quadratic, interaction

TRUE or FALSE: Whether or not to include univariate models, quadratic models, and/or models with 2-way interactions (default is FALSE).

interceptOnly

If TRUE, include an intercept-only model (default is FALSE).

method

Character: Name of function used to solve the GLM. For "normal" GLMs, this can be 'glm.fit' (default), 'brglmFit' (from the brglm2 package), or another function.

scale

Either NA (default), or TRUE or FALSE. If TRUE, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "scales". For example, if you do something like model <- trainGLM(data, scale=TRUE), then you can get the means and standard deviations using model$scales$mean and model$scales$sd. If FALSE, no scaling is done. If NA (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do its operations.

w

Weights. Any of:

  • TRUE: Causes the total weight of presences to equal the total weight of absences (if family='binomial')

  • FALSE: Each datum is assigned a weight of 1.

  • A numeric vector of weights, one per row in data.

  • The name of the column in data that contains site weights.

family

Character or function. Name of family for data error structure (see family). Default is to use the 'binomial' family.

...

Arguments to pass to glm

verbose

Logical. If TRUE then display progress.

Value

A list object with several named elements:

References

Breiner, F.T., Guisan, A., Bergamini, A., and Nobis, M.P. 2015. Overcoming limitations of modeling rare species by using ensembles of small models. Methods in Ecology and Evolution 6:1210-1218.. doi:10.1111/2041-210X.12403 Lomba, A., L. Pellissier, C. Randin, J. Vicente, J. Horondo, and A. Guisan. 2010. Overcoming the rare species modeling complex: A novel hierarchical framework applied to an Iberian endemic plant. Biological Conservation 143:2647-2657. doi:10.1016/j.biocon.2010.07.007

See Also

trainBRT, trainGAM, trainGLM, trainMaxEnt, trainMaxNet, trainNS, trainRF, trainByCrossValid

Examples

# NB: The examples below show a very basic modeling workflow. They have been 
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.

library(terra)
set.seed(123)

### setup data
##############

# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)

# coordinate reference system
wgs84 <- getCRS('WGS84')

# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)

occs <- elimCellDuplicates(occs, madClim)

occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
	
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]

# collate occurrences and background sites
presBg <- data.frame(
  presBg = c(
    rep(1, nrow(occEnv)),
    rep(0, nrow(bgEnv))
  )
)

env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)

predictors <- c('bio1', 'bio12')

### calibrate models
####################

# "traditional" ESMs with just 2 linear predictors
# just one model in this case because we have just 2 predictors
esm1 <- trainESM(
   data = env,
   resp = 'presBg',
   preds = predictors,
   family = stats::binomial(),
   scale = TRUE,
   w = TRUE
)

str(esm1, 1)
esm1$tuning

# extended ESM with other kinds of terms
esm2 <- trainESM(
   data = env,
   resp = 'presBg',
   preds = predictors,
   univariate = TRUE,
   quadratic = TRUE,
   interaction = TRUE,
   interceptOnly = TRUE,
   family = stats::binomial(),
   scale = TRUE,
   w = TRUE,
   verbose = TRUE
)

str(esm2, 1)
esm2$tuning

### make a set of predictions to rasters
########################################

# center environmental rasters and divide by their SD
madClimScaled <- scale(madClim, center = esm2$scale$mean, scale = esm2$scale$sd)

# make one raster per model
predictions <- list()
for (i in 1:length(esm2$models)) {
    predictions[[i]] <- predict(madClimScaled, esm2$models[[i]], type = 'response')
}

# combine into a "stack"
predictions <- do.call(c, predictions)
names(predictions) <- esm2$tuning$model
plot(predictions)

# calculate (unweighted) mean
prediction <- mean(predictions)
plot(prediction)
plot(occs, pch = 1, add = TRUE)

[Package enmSdmX version 1.1.6 Index]