ID {IDetect} | R Documentation |
Multiple change-point detection in piecewise-constant or continuous, piecewise-linear signals using the Isolate-Detect methodology
Description
This is the main, general function of the package. It employs more specialised functions in
order to estimate the number and locations of multiple change-points in the noisy, piecewise-constant
or continuous, piecewise-linear input vector xd
. The noise can either follow the Gaussian
distribution or not. The approach that is followed is a hybrid between the thresholding approach
(explained in pcm_th
and cplm_th
) and the information criterion approach
(explained in pcm_ic
and cplm_ic
) and estimates the change-points
taking into account both these approaches. Further to the number and the location of the estimated
change-points, ID
, returns the estimated signal, as well as the solution path.
For more information and the relevant literature reference, see Details.
Usage
ID(xd, th.cons = 1, th.cons_lin = 1.4, th.ic = 0.9, th.ic.lin = 1.25,
lambda = 3, lambda.ic = 10, contrast = c("mean", "slope"), ht = FALSE,
scale = 3)
Arguments
xd |
A numeric vector containing the data in which you would like to find change-points. |
th.cons |
A positive real number with default value equal to 1. It is
used to define the threshold, if the thresholding approach (explained in |
th.cons_lin |
A positive real number with default value equal to 1.4. It is
used to define the threshold, if the thresholding approach (explained in |
th.ic |
A positive real number with default value equal to 0.9. It is
useful only if the model selection based Isolate-Detect method (described in
|
th.ic.lin |
A positive real number with default value equal to 1.25. It is
useful only if the model selection based Isolate-Detect method (described in
|
lambda |
A positive integer with default value equal to 3. It is used only when the threshold based approach is to be followed and it defines the distance between two consecutive end- or start-points of the right- or left-expanding intervals, respectively. |
lambda.ic |
A positive integer with default value equal to 10. It is used only when the information criterion based approach is to be followed and it defines the distance between two consecutive end- or start-points of the right- or left-expanding intervals, respectively. |
contrast |
A character string, which defines the type of the contrast function to
be used in the Isolate-Detect algorithm. If |
ht |
A logical variable with default value equal to |
scale |
A positive integer number with default value equal to 3. It is
used to define the way we pre-average the given data sequence only if
|
Details
The data points provided in xd
are assumed to follow
X_t = f_t + \sigma\epsilon_t; t = 1,2,...,T,
where T
is the total length of the data sequence, X_t
are the observed
data, f_t
is a one-dimensional, deterministic signal with abrupt structural
changes at certain points, and \epsilon_t
are independent and identically
distributed random variables with mean zero and variance one. In this function,
the following scenarios for f_t
are implemented.
-
Piecewise-constant signal with Gaussian noise.
Use
contrast = ``mean''
andht = FALSE
here. -
Piecewise-constant signal with heavy-tailed noise.
Use
contrast = ``mean''
andht = TRUE
here. -
Continuous, piecewise-linear signal with Gaussian noise.
Use
contrast = ``slope''
andht = FALSE
here. -
Continuous, piecewise-linear signal with heavy-tailed noise.
Use
contrast = ``slope''
andht = TRUE
here.
In the case where ht = FALSE
: the function firstly detects the change-points using
win_pcm_th
(for the case of piecewise-constant signal) or win_cplm_th
(for the case of continuous, piecewise-linear signal). If the estimated number of change-points
is greater than 100, then the result is returned and we stop. Otherwise, ID
proceeds
to detect the change-points using pcm_ic
(for the case of piecewise-constant signal)
or cplm_ic
(for the case of continuous, piecewise-linear signal) and this is what is
returned.
In the case where ht = TRUE
: First we pre-average the given data sequence using normalise
and then, on the obtained data sequence, we follow exactly the same procedure as the one when ht = FALSE
above.
More details can be found in “Detecting multiple generalized change-points by isolating single ones”,
Anastasiou and Fryzlewicz (2018), preprint.
Value
A list with the following components:
cpt | A vector with the detected change-points. |
no_cpt | The number of change-points detected. |
fit | A numeric vector with the estimated signal. |
solution_path | A vector containing the solution path. |
Author(s)
Andreas Anastasiou, a.anastasiou@lse.ac.uk
See Also
ID_pcm
, ID_cplm
, ht_ID_pcm
, and
ht_ID_cplm
, which are the functions that are employed
in ID
, depending on which scenario is imposed by the input arguments.
Examples
single.cpt.mean <- c(rep(4,3000),rep(0,3000))
single.cpt.mean.normal <- single.cpt.mean + rnorm(6000)
single.cpt.mean.student <- single.cpt.mean + rt(6000, df = 5)
cpt.single.mean.normal <- ID(single.cpt.mean.normal)
cpt.single.mean.student <- ID(single.cpt.mean.student, ht = TRUE)
single.cpt.slope <- c(seq(0, 1999, 1), seq(1998, -1, -1))
single.cpt.slope.normal <- single.cpt.slope + rnorm(4000)
single.cpt.slope.student <- single.cpt.slope + rt(4000, df = 5)
cpt.single.slope.normal <- ID(single.cpt.slope.normal, contrast = "slope")
cpt.single.slope.student <- ID(single.cpt.slope.student, contrast = "slope", ht = TRUE)