trainMaxEnt {enmSdmX}R Documentation

Calibrate a MaxEnt (ver 3.3.3+ or "maxent") model using AICc

Description

This function calculates the "best" Maxent model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The best model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better). The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. The function requires the maxent jar file (see Details). Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.

Usage

trainMaxEnt(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  regMult = c(seq(0.5, 5, by = 0.5), 7.5, 10),
  classes = "default",
  testClasses = TRUE,
  dropOverparam = TRUE,
  anyway = TRUE,
  forceLinear = TRUE,
  jackknife = FALSE,
  arguments = NULL,
  scratchDir = NULL,
  out = "model",
  cores = 1,
  verbose = FALSE,
  ...
)

Arguments

data

Data frame.

resp

Response variable. This is either the name of the column in data or an integer indicating the column in data that has the response variable. The default is to use the first column in data as the response.

preds

Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in data.

regMult

Numeric vector. Values of the master regularization parameters (called beta in some publications) to test.

classes

Character list. Names of feature classes to use (either default to use lpqh) or any combination of lpqht, where l ==> linear features, p ==> product features, q ==> quadratic features, h ==> hinge features, and t ==> threshold features.

testClasses

Logical. If TRUE (default) then test all possible combinations of classes (note that all tested models will at least have linear features). If FALSE then use the classes provided (these will not vary between models).

dropOverparam

Logical, if TRUE (default), drop models if they have more coefficients than training occurrences. It is possible for no models to fulfill this criterion, in which case no models will be returned.

anyway

Logical. Same as dropOverparam (included for backwards compatibility. If NULL (default), then the value of dropOverparam will take precedence. If TRUE or FALSE then anyway will override the value of dropOverparam.

forceLinear

Logical. If TRUE (default) then require any tested models to include at least linear features.

jackknife

Logical. If TRUE (default) the the returned model will be also include jackknife testing of variable importance.

arguments

NULL (default) or a character list. Options to pass to maxent()'s args argument. (Do not include l, p, q, h, t, betamultiplier, or jackknife!)

scratchDir

Character. Directory to which to write temporary files. Leave as NULL to create a temporary folder in the current working directory.

out

Character vector. One or more values:

  • 'model': Model with the lowest AICc.

  • 'models': All models evaluated, sorted from lowest to highest AICc (lowest is best).

  • 'tuning': Data frame with tuning parameters, one row per model, sorted by AICc.

cores

Number of cores to use. Default is 1. If you have issues when cores > 1, please see the troubleshooting_parallel_operations guide.

verbose

Logical. If TRUE report progress and AICc table.

...

Extra arguments. Not used.

Details

This function is a wrapper for MaxEnt(). The MaxEnt function creates a series of files on disk for each model. This function assumes you do not want those files, so deletes most of them. However, there is one that cannot be deleted and the normal ways of changing its permissions in R do not work. So the function simply writes over that file (which is allowed) to make it smaller. Regardless, if you run many models your temporary directory (argument scratchDir) can fill up and require manual deletion.

Value

The object that is returned depends on the value of the out argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these.

References

Warren, D.L. and S.N. Siefert. 2011. Ecological niche modeling in Maxent: The importance of model complexity and the performance of model selection criteria. Ecological Applications 21:335-342. doi:10.1890/10-1171.1

See Also

MaxEnt

Examples


# NB: The examples below show a very basic modeling workflow. They have been 
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.

library(mgcv)
library(sf)
library(terra)
set.seed(123)

### setup data
##############

# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)

# coordinate reference system
wgs84 <- getCRS('WGS84')

# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)

occs <- elimCellDuplicates(occs, madClim)

occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
	
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]

# collate occurrences and background sites
presBg <- data.frame(
  presBg = c(
    rep(1, nrow(occEnv)),
    rep(0, nrow(bgEnv))
  )
)

env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)

predictors <- c('bio1', 'bio12')

### calibrate models
####################

# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1

# MaxEnt
mx <- trainMaxEnt(
	data = env,
	resp = 'presBg',
	preds = predictors,
	regMult = 1, # too few values for reliable model, but fast
	verbose = TRUE,
	cores = cores
)

# MaxNet
mn <- trainMaxNet(
	data = env,
	resp = 'presBg',
	preds = predictors,
	regMult = 1, # too few values for reliable model, but fast
	verbose = TRUE,
	cores = cores
)

# generalized linear model (GLM)
gl <- trainGLM(
	data = env,
	resp = 'presBg',
	preds = predictors,
	scale = TRUE, # automatic scaling of predictors
	verbose = TRUE,
	cores = cores
)

# generalized additive model (GAM)
ga <- trainGAM(
	data = env,
	resp = 'presBg',
	preds = predictors,
	verbose = TRUE,
	cores = cores
)

# natural splines
ns <- trainNS(
	data = env,
	resp = 'presBg',
	preds = predictors,
	scale = TRUE, # automatic scaling of predictors
	df = 1:2, # too few values for reliable model(?)
	verbose = TRUE,
	cores = cores
)

# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
	data = envSub,
	resp = 'presBg',
	preds = predictors,
	learningRate = 0.001, # too few values for reliable model(?)
	treeComplexity = c(2, 3), # too few values for reliable model, but fast
	minTrees = 1200, # minimum trees for reliable model(?), but fast
	maxTrees = 1200, # too small for reliable model(?), but fast
	tryBy = 'treeComplexity',
	anyway = TRUE, # return models that did not converge
	verbose = TRUE,
	cores = cores
)

# random forests
rf <- trainRF(
	data = env,
	resp = 'presBg',
	preds = predictors,
	numTrees = c(100, 500), # using at least 500 recommended, but fast!
	verbose = TRUE,
	cores = cores
)

### make maps of models
#######################

# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().

mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim) 
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)

maps <- c(
	mxMap,
	mnMap,
	glMap,
	gaMap,
	nsMap,
	brtMap,
	rfMap
)

names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)

### compare model responses to BIO12 (mean annual precipitation)
################################################################

# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL

minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)

predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)


plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))

lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')

legend(
   'topleft',
   inset = 0.01,
   legend = c(
	'MaxEnt',
	'MaxNet',
	'GLM',
	'GAM',
	'NS',
	'BRT',
	'RF'
   ),
   lty = c(1, 1:6),
   col = c(
	'black',
	'red',
	'blue',
	'green',
	'purple',
	'orange',
	'cyan'
   ),
   bg = 'white'
)



[Package enmSdmX version 1.1.5 Index]