trainGAM {enmSdmX}R Documentation

Calibrate a generalized additive model (GAM)

Description

This function constructs a generalized additive model. By default, the model is constructed in a two-stage process. First, the "construct" phase generates a series of simple models with univariate and bivariate interaction terms. These simple models are then ranked based on their AICc. Second, the "select" phase creates a "full" model from the simple models such that there is at least presPerTermInitial presences (if the response is binary) or data rows (if not) for each smooth term to be estimated (not counting the intercept). Finally, it selects the best model using AICc from all possible subsets of this "full" model. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.

Usage

trainGAM(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  gamma = 1,
  scale = 0,
  smoothingBasis = "cs",
  interaction = "te",
  interceptOnly = TRUE,
  construct = TRUE,
  select = TRUE,
  presPerTermInitial = 10,
  presPerTermFinal = 10,
  maxTerms = 8,
  w = TRUE,
  family = "binomial",
  out = "model",
  cores = 1,
  verbose = FALSE,
  ...
)

Arguments

data

Data frame.

resp

Response variable. This is either the name of the column in data or an integer indicating the column in data that has the response variable. The default is to use the first column in data as the response.

preds

Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in data.

gamma

Initial penalty to degrees of freedom to use (larger ==> smoother fits).

scale

A numeric value indicating the "scale" parameter (see argument scale in gam). The default is 0 (which allows a single smoother for Poisson and binomial error families and unknown scale for all others.)

smoothingBasis

Character. Indicates the type of smoothing basis. The default is 'cs' (cubic splines), but see smooth.terms for other options. This is the value of argument bs in a s function.

interaction

Character or NULL. Type of interaction term to use (te, ts, s, etc.). See ?te (for example) for help on any one of these. If NULL then interactions are not used.

interceptOnly

If TRUE (default) and model selection is enabled, then include an intercept-only model.

construct

If TRUE (default), then construct the model by computing AICc for all univariate and bivariate models. Then add terms up to maximum set by presPerTermInitial and maxTerms.

select

If TRUE (default), then calculate AICc for all possible subsets of models and return the model with the lowest AICc of these. This step if performed after model construction (if construct is TRUE).

presPerTermInitial

Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only if construct is TRUE.

presPerTermFinal

Positive integer. Minimum number of presence sites per term in initial starting model; used only if select is TRUE.

maxTerms

Maximum number of terms to be used in any model, not including the intercept (default is 8). Used only if construct is TRUE.

w

Weights. Any of:

  • TRUE: Causes the total weight of presences to equal the total weight of absences (if family='binomial')

  • FALSE: Each datum is assigned a weight of 1.

  • A numeric vector of weights, one per row in data.

  • The name of the column in data that contains site weights.

family

Name of family for data error structure (see ?family).

out

Character vector. One or more values:

  • 'model': Model with the lowest AICc.

  • 'models': All models evaluated, sorted from lowest to highest AICc (lowest is best).

  • 'tuning': Data frame with tuning parameters, one row per model, sorted by AICc.

cores

Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. If you have issues when cores > 1, please see the troubleshooting_parallel_operations guide.

verbose

Logical. If TRUE then display intermediate results on the display device.

...

Extra arguments (not used).

Value

The object that is returned depends on the value of the out argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these.

See Also

gam

Examples


# NB: The examples below show a very basic modeling workflow. They have been 
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.

library(mgcv)
library(sf)
library(terra)
set.seed(123)

### setup data
##############

# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)

# coordinate reference system
wgs84 <- getCRS('WGS84')

# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)

occs <- elimCellDuplicates(occs, madClim)

occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
	
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]

# collate occurrences and background sites
presBg <- data.frame(
  presBg = c(
    rep(1, nrow(occEnv)),
    rep(0, nrow(bgEnv))
  )
)

env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)

predictors <- c('bio1', 'bio12')

### calibrate models
####################

# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1

# MaxEnt
mx <- trainMaxEnt(
	data = env,
	resp = 'presBg',
	preds = predictors,
	regMult = 1, # too few values for reliable model, but fast
	verbose = TRUE,
	cores = cores
)

# MaxNet
mn <- trainMaxNet(
	data = env,
	resp = 'presBg',
	preds = predictors,
	regMult = 1, # too few values for reliable model, but fast
	verbose = TRUE,
	cores = cores
)

# generalized linear model (GLM)
gl <- trainGLM(
	data = env,
	resp = 'presBg',
	preds = predictors,
	scale = TRUE, # automatic scaling of predictors
	verbose = TRUE,
	cores = cores
)

# generalized additive model (GAM)
ga <- trainGAM(
	data = env,
	resp = 'presBg',
	preds = predictors,
	verbose = TRUE,
	cores = cores
)

# natural splines
ns <- trainNS(
	data = env,
	resp = 'presBg',
	preds = predictors,
	scale = TRUE, # automatic scaling of predictors
	df = 1:2, # too few values for reliable model(?)
	verbose = TRUE,
	cores = cores
)

# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
	data = envSub,
	resp = 'presBg',
	preds = predictors,
	learningRate = 0.001, # too few values for reliable model(?)
	treeComplexity = c(2, 3), # too few values for reliable model, but fast
	minTrees = 1200, # minimum trees for reliable model(?), but fast
	maxTrees = 1200, # too small for reliable model(?), but fast
	tryBy = 'treeComplexity',
	anyway = TRUE, # return models that did not converge
	verbose = TRUE,
	cores = cores
)

# random forests
rf <- trainRF(
	data = env,
	resp = 'presBg',
	preds = predictors,
	numTrees = c(100, 500), # using at least 500 recommended, but fast!
	verbose = TRUE,
	cores = cores
)

### make maps of models
#######################

# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().

mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim) 
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)

maps <- c(
	mxMap,
	mnMap,
	glMap,
	gaMap,
	nsMap,
	brtMap,
	rfMap
)

names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)

### compare model responses to BIO12 (mean annual precipitation)
################################################################

# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL

minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)

predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)


plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))

lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')

legend(
   'topleft',
   inset = 0.01,
   legend = c(
	'MaxEnt',
	'MaxNet',
	'GLM',
	'GAM',
	'NS',
	'BRT',
	'RF'
   ),
   lty = c(1, 1:6),
   col = c(
	'black',
	'red',
	'blue',
	'green',
	'purple',
	'orange',
	'cyan'
   ),
   bg = 'white'
)



[Package enmSdmX version 1.1.5 Index]