crossvalidationCP {crossvalidationCP}R Documentation

Cross-validation in change-point regression

Description

Generic function for cross-validation to select tuning parameters in change-point regression. It selects among a list of parameters the one with the smallest cross-validation criterion for a given method. The cross-validation criterion, the estimator, and the the folds can be specified by the user.

Usage

crossvalidationCP(Y, param = 5L, folds = 5L, estimator = leastSquares,
                  criterion = criterionL1loss,
                  output = c("param", "fit", "detailed"), ...)

Arguments

Y

the observations, can be any data type that supports the function length and the operator [] and can be passed to estimator and criterion, e.g. a numeric vector or a list. Support for matrices, i.e. for multivariate data, is planned but not implemented so far

param

a list giving the possible tuning parameters. Alternatively, a single integer which will be interpreted as the maximal number of change-points and converted to as.list(0:param). All values have to be acceptable values for the specified estimator

folds

either a list, a single integer or the string "COPPS" specifying the folds. If a list, each entry should be an integer vector with values between 1 and length(Y) giving the indices of the observations in the fold. A single integer specifies the number of folds and ordered folds are automatically created, i.e. fold i will be seq(i, length(Y), folds). "COPPS" means that a generalised COPPS procedure Zou et al. (2020) will be used, i.e. 2-fold cross-validation with Order-Preserved Sample-Splitting, meaning that the folds will be the odd and even indexed observations. Note that observations will be given in reverse order to the cross-validation criterion when the odd-indexed observations are in the test set. This allows criteria such as the modified criterion, where for the odd-indexed the first and for the even-indexed the last observation is removed

estimator

a function providing a local estimate. For pre-implemented estimators see estimators. The function must have the arguments Y, param and ..., where Y will be a subset of the observations, and param and ... will be the corresponding arguments of the called function. Note that ... will be passed to estimator and criterion. The return value must be either a list of length length(param) with each entry containing the estimated change-point locations for the given entry in param or a list containing the named entries cps and value. In this case cps has to be a list of the estimated change-points as before and value has to be a list of the locally estimated values for each entry in param, i.e. each list entry has to be a list itself of length one entry longer than the corresponding entry in cps. The function convertSingleParam offers the conversion of an estimator allowing a single parameter into an estimator allowing multiple parameters

criterion

a function providing the cross-validation criterion. For pre-implemented criteria see criteria. The function must have the arguments testset, estset and value. testset and estset are the observations of one segment that are in the test and estimation set, respectively. value is the local parameter on the segment if provided by estimator, otherwise NULL. Additionally, ... is possible and potentially necessary to absorb arguments, since the argument ... of crossvalidationCP will be passed to estimator and criterion. It must return a single numeric. All return values will be summed accordingly and which.min will be called on the vector to determine the parameter with the smallest criterion, hence some NaN values etc. are allowed

output

a string specifying the output, either "param", "fit" or "detailed". For details what they mean see Value

...

additional parameters that are passed to estimator and criterion

Value

if output == "param", the selected tuning parameter, i.e. an entry from param. If output == "fit", a list with the entries param, giving the selected tuning parameter, and fit. The named entry fit is a list giving the returned fit obtained by applying estimator to the whole data Y with the selected tuning parameter. The retured value is transformed to a list with an entry cps giving the estimated change-points and, if provided by estimator, an entry value giving the estimated local values. If output == "detailed", the same as for output == "fit", but additionally an entry CP giving all calculated cross-validation criteria. Those values are summed over all folds

References

Pein, F., and Shah, R. D. (2021) Cross-validation for change-point regression: pitfalls and solutions. arXiv:2112.03220.

Zou, C., Wang, G., and Li, R. (2020) Consistent selection of the number of change-points via sample-splitting. The Annals of Statistics, 48(1), 413–439.

See Also

estimators, criteria, convertSingleParam, VfoldCV, COPPS, CV1, CVmod

Examples

# call with default parameters:
# 5-fold cross-validation with absolute error loss, least squares estimation,
# and possible parameters being 0 to 5 change-points
# a simpler access to it is offered by VfoldCV()
crossvalidationCP(Y = rnorm(100))

# more interesting data and more detailed output
set.seed(1L)
Y <- c(rnorm(50), rnorm(50, 5), rnorm(50), rnorm(50, 5))
crossvalidationCP(Y = Y, output = "detailed")
# finds the correct change-points at 50, 100, 150
# (plus the start and end points 0 and 200)

# list of parameters, only allowing 1 or 2 change-points
crossvalidationCP(Y = Y, param = as.list(1:2))

# reducing the number of folds to 3
ret <- crossvalidationCP(Y = Y, folds = 3L, output = "detailed")
# the same but with explicitly specified folds
identical(crossvalidationCP(Y = Y, folds = list(seq(1, 200, 3), seq(2, 200, 3), seq(3, 200, 3)),
                            output = "detailed"), ret)
                            
# 2-fold cross-validation with Order-Preserved Sample-Splitting
ret <- crossvalidationCP(Y = Y, folds = "COPPS", output = "detailed")

# a simpler access to it is offered by CV1()
identical(CV1(Y = Y, output = "detailed"), ret)

# different criterion: quadratic error loss
ret <- crossvalidationCP(Y = Y, folds = "COPPS", output = "detailed", criterion = criterionL2loss)

# same as COPPS procedure; as offered by COPPS()
identical(COPPS(Y = Y, output = "detailed"), ret)

# COPPS potentially fails to provide a good selection when large changes occur at odd locations
# Example 1 in (Pein and Shah, 2021), see Section 2.2 in this paper for more details
set.seed(1)
exampleY <- rnorm(102, c(rep(10, 46), rep(0, 5), rep(30, 51)))
# misses one change-point
crossvalidationCP(Y = exampleY, folds = "COPPS", criterion = criterionL2loss) 

# correct number of change-points when modified criterion (or absolute error loss) is used
(ret <- crossvalidationCP(Y = exampleY, folds = "COPPS", criterion = criterionMod)) 

# a simpler access to it is offered by CVmod() 
identical(CVmod(Y = exampleY), ret)

# manually given criterion; identical to criterionL1loss()
testCriterion <- function(testset, estset, value = NULL, ...) {
  if (!is.null(value)) {
    return(sum(abs(testset - value)))
  }
  
  sum(abs(testset - mean(estset)))
}
identical(crossvalidationCP(Y = Y, criterion = testCriterion, output = "detailed"),
          crossvalidationCP(Y = Y, output = "detailed"))
          
# PELT as a local estimator instead of least squares estimation
# param must contain parameters that are acceptable for the given estimator
crossvalidationCP(Y = Y, estimator = pelt, output = "detailed",
                  param = list("SIC", "MBIC", 3 * log(length(Y))))

# argument minseglen of pelt specified in ...
crossvalidationCP(Y = Y, estimator = pelt, output = "detailed",
                  param = list("SIC", "MBIC", 3 * log(length(Y))), minseglen = 60)

[Package crossvalidationCP version 1.1 Index]