PageHinkley {datadriftR}R Documentation

Page-Hinkley Test for Change Detection

Description

Implements the Page-Hinkley test, a sequential analysis technique used to detect changes in the average value of a continuous signal or process. It is effective in detecting small but persistent changes over time, making it suitable for real-time monitoring applications.

Details

The Page-Hinkley test is a type of cumulative sum (CUSUM) test that accumulates differences between data points and a reference value (running mean). It triggers a change detection signal when the cumulative sum exceeds a predefined threshold. This test is especially useful for early detection of subtle shifts in the behavior of the monitored process.

Public fields

min_instances

Minimum number of instances required to start detection.

delta

Minimal change considered significant for detection.

threshold

Decision threshold for signaling a change.

alpha

Forgetting factor for the cumulative sum calculation.

x_mean

Running mean of the observed values.

sample_count

Counter for the number of samples seen.

sum

Cumulative sum used in the change detection.

change_detected

Boolean indicating if a drift has been detected.

Methods

Public methods


Method new()

Initializes the Page-Hinkley test with specific parameters.

Usage
PageHinkley$new(
  min_instances = 30,
  delta = 0.005,
  threshold = 50,
  alpha = 1 - 1e-04
)
Arguments
min_instances

Minimum number of samples before detection starts.

delta

Change magnitude to trigger detection.

threshold

Cumulative sum threshold for change detection.

alpha

Weight for older data in cumulative sum.


Method reset()

Resets all the internal states of the detector to initial values.

Usage
PageHinkley$reset()

Method add_element()

Adds a new element to the data stream and updates the detection status based on the Page-Hinkley test.

Usage
PageHinkley$add_element(x)
Arguments
x

New data value to add and evaluate.


Method detected_change()

Checks if a change has been detected based on the last update.

Usage
PageHinkley$detected_change()
Returns

Boolean indicating whether a change was detected.


Method clone()

The objects of this class are cloneable with this method.

Usage
PageHinkley$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

E. S. Page. 1954. Continuous Inspection Schemes. Biometrika 41, 1/2 (1954), 100–115.

Montiel, Jacob, et al. "Scikit-Multiflow: A Multi-output Streaming Framework." Journal of Machine Learning Research, 2018. This framework provides tools for multi-output and stream data mining and was an inspiration for some of the implementations in this class.

Implementation: https://github.com/scikit-multiflow/scikit-multiflow/blob/a7e316d1cc79988a6df40da35312e00f6c4eabb2/src/skmultiflow/drift_detection/page_hinkley.py

Examples

set.seed(123)  # Setting a seed for reproducibility
data_part1 <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.7, 0.3))

# Introduce a change in data distribution
data_part2 <- sample(c(0, 5), size = 100, replace = TRUE, prob = c(0.3, 0.7))

# Combine the two parts
data_stream <- c(data_part1, data_part2)
ph <- PageHinkley$new()
for (i in seq_along(data_stream)) {
  ph$add_element(data_stream[i])
  if (ph$detected_change()) {
    cat(sprintf("Change has been detected in data: %s - at index: %d\n", data_stream[i], i))
  }
}

[Package datadriftR version 0.0.1 Index]