R: GLM calibration with presence-absence data

calibration_glm {enmpa}

R Documentation

GLM calibration with presence-absence data

Description

Creates candidate models based on distinct parameter settings, evaluates models, and selects the ones that perform the best.

Usage

calibration_glm(data, dependent, independent, weights = NULL,
                response_type = "l", formula_mode = "moderate",
                minvar = 1, maxvar = NULL, user_formulas = NULL,
                cv_kfolds = 5, partition_index = NULL, seed = 1,
                n_threshold = 100, selection_criterion = "TSS",
                exclude_bimodal = FALSE, tolerance = 0.01,
                out_dir = NULL, parallel = FALSE,
                n_cores = NULL, verbose = TRUE)

Arguments

`data`	data.frame or matrix of data to be used in model calibration. Columns represent dependent and independent variables.
`dependent`	(character) name of dependent variable.
`independent`	(character) vector of name(s) of independent variable(s).
`weights`	(numeric) a vector with the weights for observations.
`response_type`	(character) a character string that must contain "l", "p", "q" or a combination of them. l = lineal, q = quadratic, p = interaction between two variables. Default = "l".
`formula_mode`	(character) a character string to indicate the strategy to create the formulas for candidate models. Options are: "light", "moderate", "intensive", or "complex". Default = "moderate". "complex" returns only the most complex formula defined in `response_type`.
`minvar`	(numeric) minimum number of independent variables in formulas.
`maxvar`	(numeric) maximum number of independent variables in formulas.
`user_formulas`	(character) vector with formula(s) to test. Default = NULL.
`cv_kfolds`	(numeric) number of folds to use for k-fold cross-validation exercises. Default = 5. Ignored if `partition_index` is defined.
`partition_index`	list of indices for cross-validation in k-fold. The default, NULL, uses the function `kfold_partition`.
`seed`	(numeric) a seed for k-fold partitioning.
`n_threshold`	(logical) number of threshold values to produce evaluation metrics. Default = 100.
`selection_criterion`	(character) criterion used to select best models, options are "TSS" and "ESS". Default = "TSS".
`exclude_bimodal`	(logical) whether to filter out models with one or more variables presenting concave responses. Default = FALSE.
`tolerance`	(numeric) value to modify the limit value of the metric used to filter models during model selection if none of the models meet initial considerations. Default = 0.01
`out_dir`	(character) output directory name to save the main calibration results. Default = NULL.
`parallel`	(logical) whether to run on parallel or sequential. Default = FALSE.
`n_cores`	(numeric) number of cores to use. Default = number of free processors - 1.
`verbose`	(logical) whether to print messages and show progress bar. Default = TRUE

Details

Model evaluation is done considering the ability to predict presences and absences,as well as model fitting and complexity. Model selection consists of three steps: 1) a first filter to keep the models with ROC AUC >= 0.5 (statistically significant models), 2) a second filter to maintain only models that meet the selection_criterion ("TSS": TSS >= 0.4; or "ESS": maximum Accuracy - tolerance), and 3) from those, pick the ones with delta AIC <= 2.

formula_mode options determine what strategy to iterate the predictors defined in type for creating models:

light.– returns simple iterations of complex formulas.
moderate.– returns a comprehensive number of iterations.
intensive.– returns all possible combination. Very time-consuming for 6 or more independent variables.
complex.– returns only the most complex formula.

Value

An object of the class enmpa_calibration containing: selected models, a summary of statistics for all models, results obtained in cross-validation for all models, original data used, weights, and data-partition indices used.

Examples

# Load species occurrences and environmental data.
data("enm_data", package = "enmpa")
head(enm_data)

# Calibration using linear (l), quadratic (q), products(p) responses.
cal_res <- calibration_glm(data = enm_data, dependent = "Sp",
                           independent = c("bio_1", "bio_12"),
                           response_type = "lpq", formula_mode = "moderate",
                           selection_criterion = "TSS", cv_kfolds = 3,
                           exclude_bimodal = TRUE, verbose = FALSE)
print(cal_res)
summary(cal_res)

head(cal_res$calibration_results)
head(cal_res$summary)
head(cal_res$selected)
head(cal_res$data)

[Package enmpa version 0.1.8 Index]