R: Change-point Detection Tests Based on Records

change.point {RecordTest}

R Documentation

Change-point Detection Tests Based on Records

Description

Performs change-point detection tests based on the record occurrence. The hypothesis of the classical record model (i.e., of IID continuous RVs) is tested against the alternative hypothesis that after a certain time the series stops being IID.

Usage

change.point(
  X,
  weights = function(t) 1,
  record = c("upper", "lower", "d", "s"),
  correct = c("none", "fisher", "vrbik"),
  permutation.test = FALSE,
  simulate.p.value = FALSE,
  B = 1000
)

Arguments

`X`	A numeric vector, matrix (or data frame).
`weights`	A function indicating the weight given to the different records according to their position in the series. Castillo-Mateo (2022) showed that the weights that get more power for this test are `\omega_t = \sqrt{t^2 / (t - 1)}`, i.e., `weights = function(t) ifelse(t == 1, 0, sqrt(t^2 / (t - 1)))` if `record = "upper"` or `= "lower"`. `\omega_t = \sqrt{t}`, i.e., `weights = function(t) sqrt(t)` if `record = "d"` and `\omega_t = \sqrt{t^2 / (t-2)}`, i.e., `weights = function(t) ifelse(t %in% 1:2, 0, sqrt(t^2 / (t-2)))` if `record = "s"`.
`record`	A character string that indicates the type of statistic used. The statistic with `"upper"` or `= "lower"` records, or the statistic with the difference, `"d"`, or the sum, `"s"`, of upper and lower records.
`correct`	A character string that indicates the continuity correction in the Kolmogorov distribution made to the statistic: "fisher" (Fisher and Robbins 2019), "vrbik" (Vrbik 2020) or "none" (the default) if no correction is made. The former shows better size and power, but if the value of the statistic is too large it becomes `NaN` and p-value `NA`.
`permutation.test`	Logical. Indicates whether to compute p-values by permutation simulation (Castillo-Mateo et al. 2023). It does not require that the columns of `X` be independent. If `TRUE` and `simulate.p.value = TRUE`, permutations take precedence and permutations are performed.
`simulate.p.value`	Logical. Indicates whether to compute p-values by Monte Carlo simulation. If `permutation.test = TRUE`, permutations take precedence and permutations are performed.
`B`	If `permutation.test = TRUE` or `simulate.p.value = TRUE`, an integer specifying the number of replicates used in the permutation or Monte Carlo estimation.

Details

The test is implemented as given by Castillo-Mateo (2022). The null hypothesis is that

H_0: p_t = 1/t, \qquad t=1,\ldots,T,

where p_t is the probability of (upper and/or lower) record at time t. The two-sided alternative hypothesis is that

H_1: p_t = 1/t, \quad t=1,\ldots,t_0, \qquad p_t \neq 1/t, \quad t=t_0+1,\ldots,T,

for a change-point t_0.

The variables used for the statistic are

K^{\omega}_T = \max_{1\le t \le T} \left| \frac{N_{t}^{\omega} - \textrm{E}(N_{t}^{\omega})}{\sqrt{\textrm{VAR}(N_{T}^{\omega})}} - \frac{\textrm{VAR}(N_{t}^{\omega})}{\textrm{VAR}(N_{T}^{\omega})} \frac{N_{T}^{\omega} - \textrm{E}(N_{T}^{\omega})}{\sqrt{\textrm{VAR}(N_{T}^{\omega})}} \right|,

where N_{t}^\omega = \sum_{m=1}^M \sum_{j=1}^t \omega_j I_{jm}, and the estimated change-point \hat{t}_0 is the value t where K^{\omega}_T attains its maximum.

Argument record indicates if the I_{tm}'s are the "upper" or "lower" record indicators (see I.record). If record = "d" or = "s", N_{t}^\omega is substituted in the expressions above by d_{t}^{\omega,(F)} = N_{t}^{\omega,(FU)} - N_{t}^{\omega,(FL)} or s_{t}^{\omega,(F)} = N_{t}^{\omega,(FU)} + N_{t}^{\omega,(FL)}, respectively.

The p-value is calculated by means of the asymptotic Kolmogorov distribution. When \omega_t \neq 1, the asymptotic result is not fulfilled. In that case, the p-value should be simulated using permutation or Monte Carlo simulations with the option permutation.test = TRUE or simulate.p.value = TRUE, respectively. Permutations is the only method of calculating p-values that does not require that the columns of X be independent.

As the Kolmogorov distribution is an asymptotic result, it has been seen that the size and power may be a little below than expected, to correct this, any of the continuity corrections can be used:

If correct = "fisher",

K_T = - \sqrt{T} \log\left(1 - \frac{K_T}{\sqrt{T}}\right).

If correct = "vrbik",

K_T = K_T + \frac{1}{6\sqrt{T}} + \frac{K_T - 1}{4T}.

Value

A "htest" object with elements:

`statistic`	Value of the test statistic.
`p.value`	P-value.
`alternative`	The alternative hypothesis.
`estimate`	The estimated change-point time.
`method`	A character string indicating the type of test performed.
`data.name`	A character string giving the name of the data.

Author(s)

Jorge Castillo-Mateo

References

Castillo-Mateo J (2022). “Distribution-Free Changepoint Detection Tests Based on the Breaking of Records.” Environmental and Ecological Statistics, 29(3), 655-676. doi:10.1007/s10651-022-00539-2.

Castillo-Mateo J, Cebrián AC, Asín J (2023). “Statistical Analysis of Extreme and Record-Breaking Daily Maximum Temperatures in Peninsular Spain during 1960–2021.” Atmospheric Research, 293, 106934. doi:10.1016/j.atmosres.2023.106934.

Fisher TJ, Robbins MW (2019). “A Cheap Trick to Improve the Power of a Conservative Hypothesis Test.” The American Statistician, 73(3), 232-242. doi:10.1080/00031305.2017.1395364.

Vrbik J (2020). “Deriving CDF of Kolmogorov-Smirnov Test Statistic.” Applied Mathematics, 11(3), 227-246. doi:10.4236/am.2020.113018.

Examples

change.point(ZaragozaSeries)

change.point(series_split(TX_Zaragoza$TX), record = "d", 	
  weights = function(t) sqrt(t), 
  permutation.test = TRUE, B = 50)

change.point(ZaragozaSeries, record = "d", 
  weights = function(t) sqrt(t), simulate.p.value = TRUE)

test.result <- change.point(rowMeans(ZaragozaSeries))
test.result

## Not run: Load package ggplot2 to plot the changepoint
#library("ggplot2")
#records(rowMeans(ZaragozaSeries)) + 
#  ggplot2::geom_vline(xintercept = test.result$estimate, colour = "red")

[Package RecordTest version 2.2.0 Index]