calibration_glm {enmpa}R Documentation

GLM calibration with presence-absence data

Description

Creates candidate models based on distinct parameter settings, evaluates models, and selects the ones that perform the best.

Usage

calibration_glm(data, dependent, independent, weights = NULL,
                response_type = "l", formula_mode = "moderate",
                minvar = 1, maxvar = NULL, user_formulas = NULL,
                cv_kfolds = 5, partition_index = NULL, seed = 1,
                n_threshold = 100, selection_criterion = "TSS",
                exclude_bimodal = FALSE, tolerance = 0.01,
                out_dir = NULL, parallel = FALSE,
                n_cores = NULL, verbose = TRUE)

Arguments

data

data.frame or matrix of data to be used in model calibration. Columns represent dependent and independent variables.

dependent

(character) name of dependent variable.

independent

(character) vector of name(s) of independent variable(s).

weights

(numeric) a vector with the weights for observations.

response_type

(character) a character string that must contain "l", "p", "q" or a combination of them. l = lineal, q = quadratic, p = interaction between two variables. Default = "l".

formula_mode

(character) a character string to indicate the strategy to create the formulas for candidate models. Options are: "light", "moderate", "intensive", or "complex". Default = "moderate". "complex" returns only the most complex formula defined in response_type.

minvar

(numeric) minimum number of independent variables in formulas.

maxvar

(numeric) maximum number of independent variables in formulas.

user_formulas

(character) vector with formula(s) to test. Default = NULL.

cv_kfolds

(numeric) number of folds to use for k-fold cross-validation exercises. Default = 5. Ignored if partition_index is defined.

partition_index

list of indices for cross-validation in k-fold. The default, NULL, uses the function kfold_partition.

seed

(numeric) a seed for k-fold partitioning.

n_threshold

(logical) number of threshold values to produce evaluation metrics. Default = 100.

selection_criterion

(character) criterion used to select best models, options are "TSS" and "ESS". Default = "TSS".

exclude_bimodal

(logical) whether to filter out models with one or more variables presenting concave responses. Default = FALSE.

tolerance

(numeric) value to modify the limit value of the metric used to filter models during model selection if none of the models meet initial considerations. Default = 0.01

out_dir

(character) output directory name to save the main calibration results. Default = NULL.

parallel

(logical) whether to run on parallel or sequential. Default = FALSE.

n_cores

(numeric) number of cores to use. Default = number of free processors - 1.

verbose

(logical) whether to print messages and show progress bar. Default = TRUE

Details

Model evaluation is done considering the ability to predict presences and absences,as well as model fitting and complexity. Model selection consists of three steps: 1) a first filter to keep the models with ROC AUC >= 0.5 (statistically significant models), 2) a second filter to maintain only models that meet the selection_criterion ("TSS": TSS >= 0.4; or "ESS": maximum Accuracy - tolerance), and 3) from those, pick the ones with delta AIC <= 2.

formula_mode options determine what strategy to iterate the predictors defined in type for creating models:

Value

An object of the class enmpa_calibration containing: selected models, a summary of statistics for all models, results obtained in cross-validation for all models, original data used, weights, and data-partition indices used.

Examples

# Load species occurrences and environmental data.
data("enm_data", package = "enmpa")
head(enm_data)

# Calibration using linear (l), quadratic (q), products(p) responses.
cal_res <- calibration_glm(data = enm_data, dependent = "Sp",
                           independent = c("bio_1", "bio_12"),
                           response_type = "lpq", formula_mode = "moderate",
                           selection_criterion = "TSS", cv_kfolds = 3,
                           exclude_bimodal = TRUE, verbose = FALSE)
print(cal_res)
summary(cal_res)

head(cal_res$calibration_results)
head(cal_res$summary)
head(cal_res$selected)
head(cal_res$data)


[Package enmpa version 0.1.8 Index]