seqClosedEndCpDist {npcp} | R Documentation |
Closed-end Sequential Test for Change-Point Detection in Possibly Multivariate Time Series Sensitive to Changes in the Contemporary Distribution Function
Description
Closed-end nonparametric sequential test for change-point detection based on the (multivariate) empirical distribution function. The observations can be continuous univariate or multivariate, and serially independent or dependent (strongly mixing). To carry out the test, four steps are required. The first step consists of simulating under the null many trajectories of the detector function. The second step consists of estimating a piecewise constant threshold function from these trajectories. The third step consists of computing the detector function from the data to be monitored. The fourth and last step consists of comparing the detector function with the estimated threshold function. Each of these steps corresponds to one of the functions in the usage section below. The current implementation is preliminary and not optimized for real-time monitoring (but could still be used for that). If the observations to be monitored are univariate and can be assumed serially independent, the simulation of the trajectories of the detector functions can be carried out using Monte Carlo simulation. In all other cases, the test relies on a dependent multiplier bootstrap. Details can be found in the second reference.
Usage
simClosedEndCpDist(x.learn = NULL, m = NULL, n, gamma = 0.25, delta = 1e-4,
B = 1000, method = c("sim", "mult"), b = NULL,
weights = c("parzen", "bartlett"), g = 5,
L.method = c("max","median","mean","min"))
threshClosedEndCpDist(sims, p = 1, alpha = 0.05, type = 7)
detClosedEndCpDist(x.learn, x, gamma = 0.25, delta = 1e-4)
monClosedEndCpDist(det, thresh, statistic = c("mac", "mmc", "mmk", "mk", "mc"),
plot = TRUE)
Arguments
x.learn |
a data matrix whose rows are continuous observations, representing the learning sample. |
m |
a strictly positive integer specifying the size of the
learning sample if |
n |
a strictly positive integer specifying the monitoring horizon;
the monitoring period is |
gamma |
a real parameter between 0 and 0.5 appearing in the definition of the weight function used in the detector function. |
delta |
a real parameter between 0 and 1 appearing in the definition of the weight function used in the detector function. |
B |
the number of trajectories of the detector function to simulate under the null. |
method |
a string specifying the trajectory simulation method;
can be either |
b |
strictly positive integer specifying the value of the
bandwidth parameter determining the serial dependence when
generating dependent multiplier sequences using the 'moving average
approach'; see Section 5 of the first reference. The value 1 will
create i.i.d. multiplier sequences suitable for serially independent
observations. If set to |
weights |
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the first reference. |
g |
a strictly positive integer specifying the number of points of
the uniform grid on |
L.method |
a string specifying how the parameter |
sims |
an object of class |
p |
a strictly positive integer specifying the number of steps of
the piece constant threshold function; |
alpha |
the value of the desired significance level for the sequential test. |
type |
an integer between 1 and 9 selecting one of the nine quantile
algorithms detailed in the help of the function |
x |
a data matrix whose rows are continuous observations corresponding to the new observations to be monitored for a change in contemporary distribution. |
det |
an object of class |
thresh |
an object of class |
statistic |
a string specifying the statistic/detector to be used
for the monitoring; can be either |
plot |
logical indicating whether the monitoring should be plotted. |
Details
The testing procedure is described in detail in the second reference.
Value
All functions return lists whose components have explicit names. The
function monClosedEndCpDist()
in particular returns a list whose
components are
alarm |
a logical indicating whether the detector function has exceeded the threshold function. |
time.alarm |
an integer corresponding to the time at
which the detector function has exceeded the threshold function or
|
times.max |
a vector of times at which the successive detectors
|
time.change |
an integer giving the estimated time of change if
|
Note
This is a test for continuous (multivariate) time series.
References
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
I. Kojadinovic and G. Verdier (2021), Nonparametric sequential change-point detection for multivariate time series based on empirical distribution functions, Electronic Journal of Statistics 15(1), pages 773-829, doi:10.1214/21-EJS1798.
See Also
see cpDist()
for the corresponding a posteriori (offline) test.
Examples
## Not run:
## Example of montoring for the period m+1, ..., n
m <- 100 # size of the learning sample
n <- 150 # monitoring horizon
## The learning sample
set.seed(123)
x.learn <- matrix(rnorm(m))
## New observations with a large change in mean
## to simulate monitoring for the period m+1, ..., n
k <- 125 ## the true change-point
x <- matrix(c(rnorm(k-m), rnorm(n-k, mean = 2)))
## Step 1: Simulation of B trajectories of the detector functions under the null
B <- 1e4
## Under the assumption of serial independence
## (no need to specify the learning sample)
traj.sim <- simClosedEndCpDist(m = m, n = n, B = B, method = "sim")
## Without the assumption of serial independence
## (the learning sample is compulsory; the larger it is, the better;
## the monitoring horizon n should not be too large)
traj.mult <- simClosedEndCpDist(x.learn = x.learn, n = n, B = B, method = "mult")
## Step 2: Compute threshold functions with p steps
p <- 2
tf.sim <- threshClosedEndCpDist(traj.sim, p = p)
# p can be taken large if B is very large
tf.mult <- threshClosedEndCpDist(traj.mult, p = p) # p should not be taken too
# large unless both m and B
# are very large
## Step 3: Compute the detectors for the monitoring period m+1, ... , n
det <- detClosedEndCpDist(x.learn = x.learn, x = x)
## Step 4: Monitoring
## Simulate the monitoring with the first threshold function
monClosedEndCpDist(det, tf.sim)
## Simulate the monitoring with the second threshold function
monClosedEndCpDist(det, tf.mult)
## Simulate the monitoring with the first threshold function
## and another detector function
monClosedEndCpDist(det, tf.sim, statistic = "mmk")
## Alternative steps 3 and 4:
## Compute the detectors for the monitoring period m+1, ... , m+20 only
det <- detClosedEndCpDist(x.learn = x.learn, x = x[1:20,,drop = FALSE])
## Simulate the monitoring with the first threshold function
monClosedEndCpDist(det, tf.sim)
## Simulate the monitoring with the second threshold function
monClosedEndCpDist(det, tf.mult)
## End(Not run)