calibration_glm {enmpa} | R Documentation |
GLM calibration with presence-absence data
Description
Creates candidate models based on distinct parameter settings, evaluates models, and selects the ones that perform the best.
Usage
calibration_glm(data, dependent, independent, weights = NULL,
response_type = "l", formula_mode = "moderate",
minvar = 1, maxvar = NULL, user_formulas = NULL,
cv_kfolds = 5, partition_index = NULL, seed = 1,
n_threshold = 100, selection_criterion = "TSS",
exclude_bimodal = FALSE, tolerance = 0.01,
out_dir = NULL, parallel = FALSE,
n_cores = NULL, verbose = TRUE)
Arguments
data |
data.frame or matrix of data to be used in model calibration. Columns represent dependent and independent variables. |
dependent |
(character) name of dependent variable. |
independent |
(character) vector of name(s) of independent variable(s). |
weights |
(numeric) a vector with the weights for observations. |
response_type |
(character) a character string that must contain "l", "p", "q" or a combination of them. l = lineal, q = quadratic, p = interaction between two variables. Default = "l". |
formula_mode |
(character) a character string to indicate the strategy to
create the formulas for candidate models. Options are: "light", "moderate",
"intensive", or "complex". Default = "moderate". "complex" returns only the
most complex formula defined in |
minvar |
(numeric) minimum number of independent variables in formulas. |
maxvar |
(numeric) maximum number of independent variables in formulas. |
user_formulas |
(character) vector with formula(s) to test. Default = NULL. |
cv_kfolds |
(numeric) number of folds to use for k-fold
cross-validation exercises. Default = 5. Ignored if |
partition_index |
list of indices for cross-validation in k-fold. The
default, NULL, uses the function |
seed |
(numeric) a seed for k-fold partitioning. |
n_threshold |
(logical) number of threshold values to produce evaluation metrics. Default = 100. |
selection_criterion |
(character) criterion used to select best models, options are "TSS" and "ESS". Default = "TSS". |
exclude_bimodal |
(logical) whether to filter out models with one or more variables presenting concave responses. Default = FALSE. |
tolerance |
(numeric) value to modify the limit value of the metric used to filter models during model selection if none of the models meet initial considerations. Default = 0.01 |
out_dir |
(character) output directory name to save the main calibration results. Default = NULL. |
parallel |
(logical) whether to run on parallel or sequential. Default = FALSE. |
n_cores |
(numeric) number of cores to use. Default = number of free processors - 1. |
verbose |
(logical) whether to print messages and show progress bar. Default = TRUE |
Details
Model evaluation is done considering the ability to predict presences and
absences,as well as model fitting and complexity. Model selection consists
of three steps: 1) a first filter to keep the models with ROC AUC >= 0.5
(statistically significant models), 2) a second filter to maintain only
models that meet the selection_criterion
("TSS": TSS >= 0.4; or "ESS":
maximum Accuracy - tolerance
), and 3) from those, pick the ones with
delta AIC <= 2.
formula_mode
options determine what strategy to iterate the predictors
defined in type
for creating models:
-
light.– returns simple iterations of complex formulas.
-
moderate.– returns a comprehensive number of iterations.
-
intensive.– returns all possible combination. Very time-consuming for 6 or more independent variables.
-
complex.– returns only the most complex formula.
Value
An object of the class enmpa_calibration containing: selected models, a summary of statistics for all models, results obtained in cross-validation for all models, original data used, weights, and data-partition indices used.
Examples
# Load species occurrences and environmental data.
data("enm_data", package = "enmpa")
head(enm_data)
# Calibration using linear (l), quadratic (q), products(p) responses.
cal_res <- calibration_glm(data = enm_data, dependent = "Sp",
independent = c("bio_1", "bio_12"),
response_type = "lpq", formula_mode = "moderate",
selection_criterion = "TSS", cv_kfolds = 3,
exclude_bimodal = TRUE, verbose = FALSE)
print(cal_res)
summary(cal_res)
head(cal_res$calibration_results)
head(cal_res$summary)
head(cal_res$selected)
head(cal_res$data)