gstream {gStream} | R Documentation |
Sequential Change-Point Detection based on k-Nearest Neighbors
Description
This function finds change-points in the sequence when the underlying distribution changes. It reports four graph-based test statistics and the analytical approximations for thresholds used in their corresponding stopping rules.
Usage
gstream(distM, L, N0, k, statistics = c("all", "o", "w", "g", "m"),
n0 = 0.3*L, n1 = 0.7*L, ARL = 10000, alpha = 0.05, skew.corr = TRUE, asymp = FALSE)
Arguments
distM |
A distance matrix constructed based on some distance measure. |
L |
The number of observations the k-NN graph will be constructed from. |
N0 |
The number of historical observations. |
k |
A fixed integer used to construct k-NN graph. |
statistics |
The scan statistic to be computed. A character indicating the type of of scan statistic desired. The default is
|
n0 |
The starting index to be considered as a candidate for the change-point. We recommend you set this to be 0.2*L |
n1 |
The ending index to be considered as a candidate for the change-point. For example, n1 = L-n0. |
ARL |
The average run length: the expectation of the stopping rule when there is no change-point. |
alpha |
The probability of an early stop. |
skew.corr |
Default is TRUE. If skew.corr is TRUE, the average run length approximation would incorporate skewness correction. |
asymp |
Default is FALSE. If asymp is TRUE, the average run length approximation will be based on the asymptotic analytical formulas. |
Value
Returns a list with items scanZ
, b
and tauhat
for each type of statistic specified. See below for more details.
scanZ |
A vector of the test statistic (maximum of the scan statistics) for each time n = N0+1,..N.
|
b |
Thresholds used in the stopping rules for each test statistic. These thresholds are based on analytical approximations of the average run length. |
tauhat |
Estimate of the locations of change-points based on the thresholds. |
See Also
Examples
# This example contains two distance matrices (distM1 and distM2).
# Information on how distM1 and distM2 are generated can be found in gStream.
# data(Example)
# Example:
# distM1 is a distance matrix constructed from a dataset with n=40 observation.
# The first 20 observations are treated as historical observations.
# It has been determined that there are no change-points among the
# first 20 observations (see package gSeg for offline change-point detection).
# There is change in mean when tau = 20 (This means a change happens 20 observations
# after we start the tests. We start the test at N0+1 = 21.)
# Uncomment the following to run
# N0 = 20
# L = 20 # the k-nn graph is constructed on only the L most recent observations.
# k = 1
# r1= gstream(distM1, L, N0, k, statistics="all", n0=0.3*L, n1=0.7*L,
# ARL=200,alpha=0.05, skew.corr=TRUE, asymp=FALSE)
# output results based on all four statistics; the scan statistics can be found in r1$scanZ
# r1$tauhat # reports the locations where a change-point is detected
# r1$b # reports the analytical approximations of the thresholds used in the stopping rules
# Set ARL = 10,000
# r1= gstream(distM1, L, N0, k, statistics="all", n0=0.3*L, n1=L-0.3*L,
# ARL=10000,alpha=0.05, skew.corr=TRUE, asymp=FALSE) # uncomment to run this function