isotree_po {itsdm} | R Documentation |
Build Isolation forest species distribution model and explain the the model and outputs.
Description
Call Isolation forest and its variations to do species distribution modeling and optionally call a collection of other functions to do model explanation.
Usage
isotree_po(
obs_mode = "imperfect_presence",
obs,
obs_ind_eval = NULL,
variables,
categ_vars = NULL,
contamination = 0.1,
ntrees = 100L,
sample_size = 1,
ndim = 1L,
seed = 10L,
...,
offset = 0,
response = TRUE,
spatial_response = TRUE,
check_variable = TRUE,
visualize = FALSE
)
Arguments
obs_mode |
( |
obs |
( |
obs_ind_eval |
( |
variables |
( |
categ_vars |
( |
contamination |
( |
ntrees |
( |
sample_size |
( |
ndim |
( |
seed |
( |
... |
Other arguments that |
offset |
( |
response |
( |
spatial_response |
( |
check_variable |
( |
visualize |
( |
Details
For "perfect_presence", a user-defined number (contamination
) of samples
will be taken from background to let iForest
function normally.
If "imperfect_presence", no further actions is required.
If the obs_mode is "presence_absence", a contamination
percent
of absences will be randomly selected and work together with all presences
to train the model.
NOTE: obs_mode and mode only works for obs
. obs_ind_eval
will follow its own structure.
Please read details of algorithm isolation.forest
on
https://github.com/david-cortes/isotree, and
the R documentation of function isolation.forest
.
Value
(POIsotree
) A list of
model (
isolation.forest
) The threshold set in function inputsvariables (
stars
) The formatted image stack of environmental variablesbackground_samples (
sf
) Asf
of background points for training dataset evaluation or SHAP dependence plotindependent_test (
sf
orNULL
) Asf
of test occurrence datasetbackground_samples_test (
sf
orNULL
) Asf
of background points for test dataset evaluation or SHAP dependence plotvars_train (
data.frame
) Adata.frame
with values of each environmental variables for training occurrencepred_train (
data.frame
) Adata.frame
with values of prediction for training occurrenceeval_train (
POEvaluation
) A list of presence-only evaluation metrics based on training dataset. See details ofPOEvaluation
inevaluate_po
var_test (
data.frame
orNULL
) Adata.frame
with values of each environmental variables for test occurrencepred_test (
data.frame
orNULL
) Adata.frame
with values of prediction for test occurrenceeval_test (
POEvaluation
orNULL
) A list of presence-only evaluation metrics based on test dataset. See details ofPOEvaluation
inevaluate_po
prediction (
stars
) The predicted environmental suitabilitymarginal_responses (
MarginalResponse
orNULL
) A list of marginal response values of each environmental variables. See details inmarginal_response
offset (
numeric
) The offset value set as inputs.independent_responses (
IndependentResponse
orNULL
) A list of independent response values of each environmental variables. See details inindependent_response
shap_dependences (
ShapDependence
orNULL
) A list of variable dependence values of each environmental variables. See details inshap_dependence
spatial_responses (
SpatialResponse
orNULL
) A list of spatial variable dependence values of each environmental variables. See details inshap_dependence
variable_analysis (
VariableAnalysis
orNULL
) A list of variable importance analysis based on multiple metrics. See details invariable_analysis
References
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee international conference on data mining.IEEE, 2008. doi:10.1109/ICDM.2008.17
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation-based anomaly detection." ACM Transactions on Knowledge Discovery from Data (TKDD) 6.1 (2012): 1-39. doi:10.1145/2133360.2133363
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "On detecting clustered anomalies using SCiForest." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2010. doi:10.1007/978-3-642-15883-4_18
Ha riri, Sahand, Matias Carrasco Kind, and Robert J. Brunner. "Extended isolation forest." IEEE Transactions on Knowledge and Data Engineering (2019). doi:10.1109/TKDE.2019.2947676
References of related feature such as response curves and variable importance will be listed under their own functions
See Also
evaluate_po
, marginal_response
,
independent_response
, shap_dependence
,
spatial_response
, variable_analysis
,
isolation.forest
Examples
########### Presence-absence mode #################
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Load example dataset
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = obs_type)
# Load variables
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
# Modeling
mod_virtual_species <- isotree_po(
obs_mode = "presence_absence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.6, ndim = 1L,
seed = 123L, nthreads = 1)
# Check results
## Evaluation based on training dataset
print(mod_virtual_species$eval_train)
plot(mod_virtual_species$eval_train)
## Response curves
plot(mod_virtual_species$marginal_responses)
plot(mod_virtual_species$independent_responses,
target_var = c('bio1', 'bio5'))
plot(mod_virtual_species$shap_dependence)
## Relationships between target var and related var
plot(mod_virtual_species$shap_dependence,
target_var = c('bio1', 'bio5'),
related_var = 'bio12', smooth_span = 0)
# Variable importance
mod_virtual_species$variable_analysis
plot(mod_virtual_species$variable_analysis)
########### Presence-absence mode ##################
# Load example dataset
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
# Modeling with perfect_presence mode
mod_perfect_pres <- isotree_po(
obs_mode = "perfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.6, ndim = 1L,
seed = 123L, nthreads = 1)
# Modeling with imperfect_presence mode
mod_imperfect_pres <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.6, ndim = 1L,
seed = 123L, nthreads = 1)