ph_anomaly {pheble} | R Documentation |
Detect anomalies.
Description
The ph_anomaly
function detects and removes anomalies with an autoencoder. Because it is general
purpose, it can be applied to a variety of data types. The parameters in this function (e.g., activation,
hidden, dropout_ratio) can be supplied as lists or vectors (see parameter details) to perform a grid search
for the optimal hyperparameter combination. The autoencoder with the lowest reconstruction error is selected as
the best model.
Usage
ph_anomaly(
df,
ids_col,
class_col,
method = "ae",
scale = FALSE,
center = NULL,
sd = NULL,
max_mem_size = "15g",
port = 54321,
train_seed = 123,
hyper_params = list(),
search = "random",
tune_length = 100
)
Arguments
df |
A |
ids_col |
A |
class_col |
A |
method |
A |
scale |
A |
center |
Either a |
sd |
Either a |
max_mem_size |
A |
port |
A |
train_seed |
A |
hyper_params |
A
|
search |
A |
tune_length |
A |
Value
A list containing the following components:
df | The data frame with anomalies removed. |
model | The best model from the grid search used to detect anomalies. |
anom_score | A data frame of predicted anomaly scores. |
Examples
## Import data.
data(ph_crocs)
## Remove anomalies with autoencoder.
rm_outs <- ph_anomaly(df = ph_crocs, ids_col = "Biosample",
class_col = "Species", method = "ae")
## Alternatively, remove anomalies with extended isolation forest. Notice
## that port is defined, because running H2O sessions one after another
## can return connection errors.
rm_outs <- ph_anomaly(df = ph_crocs, ids_col = "Biosample",
class_col = "Species", method = "iso",
port = 50001)