fit_hglm_occupancy_models {surveyvoi} | R Documentation |
Fit hierarchical generalized linear models to predict occupancy
Description
Estimate probability of occupancy for a set of features in a set of
planning units. Models are fitted as hierarchical generalized linear models
that account for for imperfect detection (following Royle & Link 2006)
using JAGS (via runjags::run.jags()
). To limit over-fitting,
covariate coefficients are sampled using a Laplace prior distribution
(equivalent to L1 regularization used in machine learning contexts)
(Park & Casella 2008).
Usage
fit_hglm_occupancy_models(
site_data,
feature_data,
site_detection_columns,
site_n_surveys_columns,
site_env_vars_columns,
feature_survey_sensitivity_column,
feature_survey_specificity_column,
jags_n_samples = rep(10000, length(site_detection_columns)),
jags_n_burnin = rep(1000, length(site_detection_columns)),
jags_n_thin = rep(100, length(site_detection_columns)),
jags_n_adapt = rep(1000, length(site_detection_columns)),
jags_n_chains = rep(4, length(site_detection_columns)),
n_folds = rep(5, length(site_detection_columns)),
n_threads = 1,
seed = 500,
verbose = FALSE
)
Arguments
site_data |
|
feature_data |
|
site_detection_columns |
|
site_n_surveys_columns |
|
site_env_vars_columns |
|
feature_survey_sensitivity_column |
|
feature_survey_specificity_column |
|
jags_n_samples |
|
jags_n_burnin |
|
jags_n_thin |
|
jags_n_adapt |
|
jags_n_chains |
|
n_folds |
|
n_threads |
|
seed |
|
verbose |
|
Details
This function (i) prepares the data for model fitting, (ii) fits the models, and (iii) assesses the performance of the models. These analyses are performed separately for each feature. For a given feature:
The data are prepared for model fitting by partitioning the data using k-fold cross-validation (set via argument to
n_folds
). The training and evaluation folds are constructed in such a manner as to ensure that each training and evaluation fold contains at least one presence and one absence observation.A model for fit separately for each fold (see
inst/jags/model.jags
for model code). To assess convergence, the multi-variate potential scale reduction factor (MPSRF) statistic is calculated for each model.The performance of the cross-validation models is evaluated. Specifically, the TSS, sensitivity, and specificity statistics are calculated (if relevant, weighted by the argument to
site_weights_data
). These performance values are calculated using the models' training and evaluation folds. To assess convergence, the maximum MPSRF statistic for the models fit for each feature is calculated.
Value
A list
object containing:
- models
list
oflist
objects containing the models.- predictions
tibble::tibble()
object containing predictions for each feature.- performance
tibble::tibble()
object containing the performance of the best models for each feature. It contains the following columns:- feature
name of the feature.
- max_mpsrf
maximum multi-variate potential scale reduction factor (MPSRF) value for the models. A MPSRF value less than 1.05 means that all coefficients in a given model have converged, and so a value less than 1.05 in this column means that all the models fit for a given feature have successfully converged.
- train_tss_mean
-
mean TSS statistic for models calculated using training data in cross-validation.
- train_tss_std
-
standard deviation in TSS statistics for models calculated using training data in cross-validation.
- train_sensitivity_mean
-
mean sensitivity statistic for models calculated using training data in cross-validation.
- train_sensitivity_std
-
standard deviation in sensitivity statistics for models calculated using training data in cross-validation.
- train_specificity_mean
-
mean specificity statistic for models calculated using training data in cross-validation.
- train_specificity_std
-
standard deviation in specificity statistics for models calculated using training data in cross-validation.
- test_tss_mean
-
mean TSS statistic for models calculated using test data in cross-validation.
- test_tss_std
-
standard deviation in TSS statistics for models calculated using test data in cross-validation.
- test_sensitivity_mean
-
mean sensitivity statistic for models calculated using test data in cross-validation.
- test_sensitivity_std
-
standard deviation in sensitivity statistics for models calculated using test data in cross-validation.
- test_specificity_mean
-
mean specificity statistic for models calculated using test data in cross-validation.
- test_specificity_std
-
standard deviation in specificity statistics for models calculated using test data in cross-validation.
Dependencies
This function requires the JAGS software to be installed. For information on installing the JAGS software, please consult the documentation for the rjags package.
References
Park T & Casella G (2008) The Bayesian lasso. Journal of the American Statistical Association, 103: 681–686.
Royle JA & Link WA (2006) Generalized site occupancy models allowing for false positive and false negative errors. Ecology, 87: 835–841.
Examples
## Not run:
# set seeds for reproducibility
set.seed(123)
# simulate data for 200 sites, 2 features, and 3 environmental variables
site_data <- simulate_site_data(n_sites = 30, n_features = 2, prop = 0.1)
feature_data <- simulate_feature_data(n_features = 2, prop = 1)
# print JAGS model code
cat(readLines(system.file("jags", "model.jags", package = "surveyvoi")),
sep = "\n")
# fit models
# note that we use a small number of MCMC iterations so that the example
# finishes quickly, you probably want to use the defaults for real work
results <- fit_hglm_occupancy_models(
site_data, feature_data,
c("f1", "f2"), c("n1", "n2"), c("e1", "e2", "e3"),
"survey_sensitivity", "survey_specificity",
n_folds = rep(5, 2),
jags_n_samples = rep(250, 2), jags_n_burnin = rep(250, 2),
jags_n_thin = rep(1, 2), jags_n_adapt = rep(100, 2),
n_threads = 1)
# print model predictions
print(results$predictions)
# print model performance
print(results$performance, width = Inf)
## End(Not run)