KSWIN {datadriftR}R Documentation

KSWIN (Kolmogorov-Smirnov WINdowing) for Change Detection

Description

Implements the Kolmogorov-Smirnov test for detecting distribution changes within a window of streaming data. KSWIN is a non-parametric method for change detection that compares two samples to determine if they come from the same distribution.

Details

KSWIN is effective for detecting changes in the underlying distribution of data streams. It is particularly useful in scenarios where data properties may evolve over time, allowing for early detection of changes that might affect subsequent data processing.

Public fields

alpha

Significance level for the KS test.

window_size

Total size of the data window used for testing.

stat_size

Number of data points sampled from the window for the KS test.

window

Current data window used for change detection.

change_detected

Boolean flag indicating whether a change has been detected.

p_value

P-value of the most recent KS test.

Methods

Public methods


Method new()

Initializes the KSWIN detector with specific settings.

Usage
KSWIN$new(alpha = 0.005, window_size = 100, stat_size = 30, data = NULL)
Arguments
alpha

The significance level for the KS test.

window_size

The size of the data window for change detection.

stat_size

The number of samples in the statistical test window.

data

Initial data to populate the window, if provided.


Method reset()

Resets the internal state of the detector to its initial conditions.

Usage
KSWIN$reset()

Method add_element()

Adds a new element to the data window and updates the detection status based on the KS test.

Usage
KSWIN$add_element(x)
Arguments
x

The new data value to add to the window.


Method detected_change()

Checks if a change has been detected based on the most recent KS test.

Usage
KSWIN$detected_change()
Returns

Boolean indicating whether a change was detected.


Method clone()

The objects of this class are cloneable with this method.

Usage
KSWIN$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Christoph Raab, Moritz Heusinger, Frank-Michael Schleif, Reactive Soft Prototype Computing for Concept Drift Streams, Neurocomputing, 2020.

Implementation: https://github.com/scikit-multiflow/scikit-multiflow/blob/a7e316d1cc79988a6df40da35312e00f6c4eabb2/src/skmultiflow/drift_detection/kswin.py

Examples

set.seed(123)  # Setting a seed for reproducibility
data_part1 <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.7, 0.3))

# Introduce a change in data distribution
data_part2 <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.3, 0.7))

# Combine the two parts
data_stream <- c(data_part1, data_part2)

[Package datadriftR version 0.0.1 Index]