R: Multiple change-point detection in the mean via thresholding

pcm_th {IDetect}

R Documentation

Multiple change-point detection in the mean via thresholding

Description

This function performs the Isolate-Detect methodology (see Details for the relevant literature reference) with the thresholding-based stopping rule in order to detect multiple change-points in the mean of a noisy input vector x, with Gaussian noise. See Details for a brief explanation of the Isolate-Detect methodology, and of the thresholding-based stopping rule.

Usage

pcm_th(x, sigma = stats::mad(diff(x)/sqrt(2)), thr_const = 1,
  thr_fin = sigma * thr_const * sqrt(2 * log(length(x))), s = 1,
  e = length(x), points = 3, k_l = 1, k_r = 1)

Arguments

`x`	A numeric vector containing the data in which you would like to find change-points.
`sigma`	A positive real number. It is the estimate of the standard deviation of the noise in `x`. The default value is the median absolute deviation of `x` computed under the assumption that the noise is independent and identically distributed from the Gaussian distribution.
`thr_const`	A positive real number with default value equal to 1. It is used to define the threshold; see `thr_fin`.
`thr_fin`	With `T` the length of the data sequence, this is a positive real number with default value equal to `sigma * thr_const * sqrt(2 * log(T))`. It is the threshold, which is used in the detection process.
`s`, `e`	Positive integers with `s` less than `e`, which indicate that you want to check for change-points in the data sequence with subscripts in `[s,e]`. The default values are `s` equal to 1 and `e` equal to `T`, with `T` the length of the data sequence.
`points`	A positive integer with default value equal to 3. It defines the distance between two consecutive end- or start-points of the right- or left-expanding intervals, respectively; see Details for more information.
`k_l`, `k_r`	Positive integer numbers that get updated whenever the function calls itself during the detection process. They are not essential for the function to work, and we include them only to reduce the computational time.

Details

The change-point detection algorithm that is used in pcm_th is the Isolate-Detect methodology described in “Detecting multiple generalized change-points by isolating single ones”, Anastasiou and Fryzlewicz (2018), preprint. The concept is simple and is split into two stages; firstly, isolation of each of the true change-points in subintervals of the data domain, and secondly their detection. ID first creates two ordered sets of K = \lceil T/\code{points}\rceil right- and left-expanding intervals as follows. The j^{th} right-expanding interval is R_j = [1, j\times \code{points}], while the j^{th} left-expanding interval is L_j = [T - j\times \code{points} + 1, T]. We collect these intervals in the ordered set S_{RL} = \lbrace R_1, L_1, R_2, L_2, ... , R_K, L_K\rbrace. For a suitably chosen contrast function, ID first identifies the point with the maximum contrast value in R_1. If its value exceeds a certain threshold, then it is taken as a change-point. If not, then the process tests the next interval in S_{RL} and repeats the above process. Upon detection, the algorithm makes a new start from estimated location.

Value

A numeric vector with the detected change-points.

Author(s)

Andreas Anastasiou, a.anastasiou@lse.ac.uk

Examples

single.cpt <- c(rep(4,1000),rep(0,1000))
single.cpt.noise <- single.cpt + rnorm(2000)
cpt.single.th <- pcm_th(single.cpt.noise)

three.cpt <- c(rep(4,500),rep(0,500),rep(-4,500),rep(1,500))
three.cpt.noise <- three.cpt + rnorm(2000)
cpt.three.th <- pcm_th(three.cpt.noise)

multi.cpt <- rep(c(rep(0,50),rep(3,50)),20)
multi.cpt.noise <- multi.cpt + rnorm(2000)
cpt.multi.th <- pcm_th(multi.cpt.noise)

[Package IDetect version 0.1.0 Index]