lookout {lookout}R Documentation

Identifies outliers using the algorithm lookout.

Description

This function identifies outliers using the algorithm lookout, an outlier detection method that uses leave-one-out kernel density estimates and generalized Pareto distributions to find outliers.

Usage

lookout(X, alpha = 0.05, unitize = TRUE, bw = NULL, gpd = NULL, fast = TRUE)

Arguments

X

The input data in a dataframe, matrix or tibble format.

alpha

The level of significance. Default is 0.05.

unitize

An option to normalize the data. Default is TRUE, which normalizes each column to [0,1].

bw

Bandwidth parameter. Default is NULL as the bandwidth is found using Persistent Homology.

gpd

Generalized Pareto distribution parameters. If 'NULL' (the default), these are estimated from the data.

fast

If set to TRUE, makes the computation faster by sub-setting the data for the bandwidth calculation.

Value

A list with the following components:

outliers

The set of outliers.

outlier_probability

The GPD probability of the data.

outlier_scores

The outlier scores of the data.

bandwidth

The bandwdith selected using persistent homology.

kde

The kernel density estimate values.

lookde

The leave-one-out kde values.

gpd

The fitted GPD parameters.

Examples

X <- rbind(
  data.frame(x = rnorm(500),
             y = rnorm(500)),
  data.frame(x = rnorm(5, mean = 10, sd = 0.2),
             y = rnorm(5, mean = 10, sd = 0.2))
)
lo <- lookout(X)
lo
autoplot(lo)

[Package lookout version 0.1.4 Index]