isotree_po {itsdm}R Documentation

Build Isolation forest species distribution model and explain the the model and outputs.

Description

Call Isolation forest and its variations to do species distribution modeling and optionally call a collection of other functions to do model explanation.

Usage

isotree_po(
  obs_mode = "imperfect_presence",
  obs,
  obs_ind_eval = NULL,
  variables,
  categ_vars = NULL,
  contamination = 0.1,
  ntrees = 100L,
  sample_size = 1,
  ndim = 1L,
  seed = 10L,
  ...,
  offset = 0,
  response = TRUE,
  spatial_response = TRUE,
  check_variable = TRUE,
  visualize = FALSE
)

Arguments

obs_mode

(string) The mode of observations for training. It should be one of c("perfect_presence", "imperfect_presence", "presence_absence"). "perfect_presence" means presence-only occurrences without errors/uncertainties/bias, which should be rare in reality. "Imperfect_presence" means presence-only occurrences with errors/uncertainties/bias, which should be a most common case. "presence_absence" means presence-absence observations regardless quality. See details to learn how to set it. The default is "imperfect_presence".

obs

(sf) The sf of observation for training. It is recommended to call function format_observation to format the occurrence (obs) before passing it here. Otherwise, make sure there is a column named "observation" for observation.

obs_ind_eval

(sf or NULL) Optional sf of observations for independent test. It is recommended to call function format_observation to format the occurrence (obs) before passing it here. Otherwise, make sure there is a column named "observation" for observation. If NULL, no independent test set will be used. The default is NULL.

variables

(RasterStack or stars) The stack of environmental variables.

categ_vars

(vector of character or NULL) The names of categorical variables. Must be the same as the names in variables.

contamination

(numeric) The percentage of abnormal cases within a dataset. Because iForest is an outlier detection algorithm. It picks up abnormal cases (much fewer) from normal cases. This argument is used to set how many abnormal cases should be there if the users have the power to control. See details for how to set it. The value should be less than 0.5. Here we constrain it in (0, 0.3]. The default value is 0.1.

ntrees

(integer) The number of trees for the isolation forest. It must be integer, which you could use function as.integer to convert to. The default is 100L.

sample_size

(numeric) It should be a rate for sampling size in ⁠[0, 1]⁠. The default is 1.0.

ndim

(integer) ExtensionLevel for isolation forest. It must be integer, which you could use function as.integer to convert to. Also, it must be no smaller than the dimension of environmental variables. When it is 1, the model is a traditional isolation forest, otherwise the model is an extended isolation forest. The default is 1.

seed

(integer) The random seed used in the modeling. It should be an integer. The default is 10L.

...

Other arguments that isolation.forest needs.

offset

(numeric) The offset to adjust fitted suitability. The default is zero. Highly recommend to leave it as default.

response

(logical) If TRUE, generate response curves. The default is TRUE.

spatial_response

(logical) If TRUE, generate spatial response maps. The default is TRUE because it might be slow. NOTE that here SHAP-based map is not generated because it is slow. If you want it be mapped, you could call function spatial_response to make it.

check_variable

(logical) If TRUE, check the variable importance. The default is TRUE.

visualize

(logical) If TRUE, generate the essential figures related to the model. The default is FALSE.

Details

For "perfect_presence", a user-defined number (contamination) of samples will be taken from background to let iForest function normally.

If "imperfect_presence", no further actions is required.

If the obs_mode is "presence_absence", a contamination percent of absences will be randomly selected and work together with all presences to train the model.

NOTE: obs_mode and mode only works for obs. obs_ind_eval will follow its own structure.

Please read details of algorithm isolation.forest on https://github.com/david-cortes/isotree, and the R documentation of function isolation.forest.

Value

(POIsotree) A list of

References

See Also

evaluate_po, marginal_response, independent_response, shap_dependence, spatial_response, variable_analysis, isolation.forest

Examples


########### Presence-absence mode #################
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Load example dataset
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = obs_type)

# Load variables
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))

# Modeling
mod_virtual_species <- isotree_po(
  obs_mode = "presence_absence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.6, ndim = 1L,
  seed = 123L, nthreads = 1)

# Check results
## Evaluation based on training dataset
print(mod_virtual_species$eval_train)
plot(mod_virtual_species$eval_train)

## Response curves
plot(mod_virtual_species$marginal_responses)
plot(mod_virtual_species$independent_responses,
     target_var = c('bio1', 'bio5'))
plot(mod_virtual_species$shap_dependence)

## Relationships between target var and related var
plot(mod_virtual_species$shap_dependence,
     target_var = c('bio1', 'bio5'),
     related_var = 'bio12', smooth_span = 0)

# Variable importance
mod_virtual_species$variable_analysis
plot(mod_virtual_species$variable_analysis)

########### Presence-absence mode ##################
# Load example dataset
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

# Modeling with perfect_presence mode
mod_perfect_pres <- isotree_po(
  obs_mode = "perfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.6, ndim = 1L,
  seed = 123L, nthreads = 1)

# Modeling with imperfect_presence mode
mod_imperfect_pres <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.6, ndim = 1L,
  seed = 123L, nthreads = 1)



[Package itsdm version 0.2.1 Index]