| isolationForest {solitude} | R Documentation |
Fit an Isolation Forest
Description
'solitude' class implements the isolation forest method
introduced by paper Isolation based Anomaly Detection (Liu, Ting and Zhou
<doi:10.1145/2133360.2133363>). The extremely randomized trees (extratrees)
required to build the isolation forest is grown using
ranger function from ranger package.
Design
$new() initiates a new 'solitude' object. The
possible arguments are:
-
sample_size: (positive integer, default = 256) Number of observations in the dataset to used to build a tree in the forest -
num_trees: (positive integer, default = 100) Number of trees to be built in the forest -
replace: (boolean, default = FALSE) Whether the sample of observations should be chosen with replacement when sample_size is less than the number of observations in the dataset -
seed: (positive integer, default = 101) Random seed for the forest -
nproc: (NULL or a positive integer, default: NULL, means use all resources) Number of parallel threads to be used by ranger -
respect_unordered_factors: (string, default: "partition")See respect.unordered.factors argument inranger -
max_depth: (positive number, default: ceiling(log2(sample_size))) See max.depth argument inranger
$fit() fits a isolation forest for the given dataframe or sparse matrix, computes
depths of terminal nodes of each tree and stores the anomaly scores and
average depth values in $scores object as a data.table
$predict() returns anomaly scores for a new data as a data.table
Details
Parallelization:
rangeris parallelized and by default uses all the resources. This is supported when nproc is set to NULL. The process of obtaining depths of terminal nodes (which is excuted with$fit()is called) may be parallelized separately by setting up a future backend.
Methods
Public methods
Method new()
Usage
isolationForest$new( sample_size = 256, num_trees = 100, replace = FALSE, seed = 101, nproc = NULL, respect_unordered_factors = NULL, max_depth = ceiling(log2(sample_size)) )
Method fit()
Usage
isolationForest$fit(dataset)
Method predict()
Usage
isolationForest$predict(data)
Method clone()
The objects of this class are cloneable with this method.
Usage
isolationForest$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
## Not run:
library("solitude")
library("tidyverse")
library("mlbench")
data(PimaIndiansDiabetes)
PimaIndiansDiabetes = as_tibble(PimaIndiansDiabetes)
PimaIndiansDiabetes
splitter = PimaIndiansDiabetes %>%
select(-diabetes) %>%
rsample::initial_split(prop = 0.5)
pima_train = rsample::training(splitter)
pima_test = rsample::testing(splitter)
iso = isolationForest$new()
iso$fit(pima_train)
scores_train = pima_train %>%
iso$predict() %>%
arrange(desc(anomaly_score))
scores_train
umap_train = pima_train %>%
scale() %>%
uwot::umap() %>%
setNames(c("V1", "V2")) %>%
as_tibble() %>%
rowid_to_column() %>%
left_join(scores_train, by = c("rowid" = "id"))
umap_train
umap_train %>%
ggplot(aes(V1, V2)) +
geom_point(aes(size = anomaly_score))
scores_test = pima_test %>%
iso$predict() %>%
arrange(desc(anomaly_score))
scores_test
## End(Not run)