R: Cross-validation with Order-Preserved Sample-Splitting

COPPS {crossvalidationCP}

R Documentation

Cross-validation with Order-Preserved Sample-Splitting

Description

Tuning parameters are selected by a generalised COPPS procedure. All functions use Order-Preserved Sample-Splitting, meaning that the folds will be the odd and even indexed observations. The three functions differ in which cross-validation criterion they are using. COPPS is the original COPPS procedure Zou et al. (2020), i.e. uses quadratic error loss. CV1 and CVmod use absolute error loss and the modified quadratic error loss, respectively.

Usage

COPPS(Y, param = 5L, estimator = leastSquares,
      output = c("param", "fit", "detailed"), ...)
CV1(Y, param = 5L, estimator = leastSquares,
    output = c("param", "fit", "detailed"), ...)
CVmod(Y, param = 5L, estimator = leastSquares,
      output = c("param", "fit", "detailed"), ...)

Arguments

`Y`	the observations, can be any data type that supports the function `length` and the operator `[]` and can be passed to `estimator` and the `cross-validation criterion`, e.g. a numeric vector or a list. Support for `matrices`, i.e. for multivariate data, is planned but not implemented so far
`param`	a `list` giving the possible tuning parameters. Alternatively, a single integer which will be interpreted as the maximal number of change-points and converted to `as.list(0:param)`
`estimator`	a function providing a local estimate. For pre-implemented estimators see estimators. The function must have the arguments `Y`, `param` and `...`, where `Y` will be a subset of the observations, and `param` and `...` will be the corresponding arguments of the called function. Note that `...` will be passed to `estimator` and the `cross-validation criterion`. The return value must be either a list of length `length(param)` with each entry containing the estimated change-point locations for the given entry in `param` or a list containing the named entries `cps` and `value`. In this case `cps` has to be a list of the estimated change-points as before and `value` has to be a list of the locally estimated values for each entry in `param`, i.e. each list entry has to be a list itself of length one entry longer than the corresponding entry in `cps`. The function `convertSingleParam` offers the conversion of an estimator allowing a single parameter into an estimator allowing multiple parameters
`output`	a string specifying the output, either `"param"`, `"fit"` or `"detailed"`. For details what they mean see Value
`...`	additional parameters that are passed to `estimator` and the `cross-validation criterion`

Value

if output == "param", the selected tuning parameter, i.e. an entry from param. If output == "fit", a list with the entries param, giving the selected tuning parameter, and fit. The named entry fit is a list giving the returned fit obtained by applying estimator to the whole data Y with the selected tuning parameter. The returned value is transformed to a list with an entry cps giving the estimated change-points and, if provided by estimator, an entry value giving the estimated local values. If output == "detailed", the same as for output == "fit", but additionally the entries CP, CVodd, and CVeven giving the calculated cross-validation criteria for all parameter entries. CVodd and CVeven are the criteria when the odd / even observations are in the test set, respectively. CP is the sum of those two.

References

Pein, F., and Shah, R. D. (2021) Cross-validation for change-point regression: pitfalls and solutions. arXiv:2112.03220.

Zou, C., Wang, G., and Li, R. (2020) Consistent selection of the number of change-points via sample-splitting. The Annals of Statistics, 48(1), 413–439.

Examples

# call with default parameters:
# 2-folds cross-validation with ordereded folds, absolute error loss,
# least squares estimation, and possible parameters being 0 to 5 change-points
CV1(Y = rnorm(100))
# the same, but with modified error loss
CVmod(Y = rnorm(100))
# the same, but with quadratic error loss, indentical to COPPS procedure
COPPS(Y = rnorm(100))

# more interesting data and more detailed output
set.seed(1L)
Y <- c(rnorm(50), rnorm(50, 5), rnorm(50), rnorm(50, 5))
CV1(Y = Y, output = "detailed")
# finds the correct change-points at 50, 100, 150
# (plus the start and end points 0 and 200)

# list of parameters, only allowing 1 or 2 change-points
CVmod(Y = Y, param = as.list(1:2))

# COPPS potentially fails to provide a good selection when large changes occur at odd locations
# Example 1 in (Pein and Shah, 2021), see Section 2.2 in this paper for more details
set.seed(1)
exampleY <- rnorm(102, c(rep(10, 46), rep(0, 5), rep(30, 51)))
# misses one change-point
COPPS(Y = exampleY) 

# correct number of change-points when modified criterion (or absolute error loss) is used
CVmod(Y = exampleY)

# PELT as a local estimator instead of least squares estimation
# param must contain parameters that are acceptable for the given estimator
CV1(Y = Y, estimator = pelt, output = "detailed", param = list("SIC", "MBIC", 3 * log(length(Y))))

# argument minseglen of pelt specified in ...
CVmod(Y = Y, estimator = pelt, output = "detailed", param = list("SIC", "MBIC", 3 * log(length(Y))),
                  minseglen = 30)

[Package crossvalidationCP version 1.1 Index]