VfoldCV {crossvalidationCP}R Documentation

V-fold cross-validation

Description

Selects the number of change-points by minimizing a V-fold cross-validation criterion. The criterion, the estimator, and the number of folds can be specified by the user.

Usage

VfoldCV(Y, V = 5L, Kmax = 8L, adaptiveKmax = TRUE, tolKmax = 3L, estimator = leastSquares,
        criterion = criterionL1loss, output = c("param", "fit", "detailed"), ...) 

Arguments

Y

the observations, can be any data type that supports the function length and the operator [] and can be passed to estimator and criterion, e.g. a numeric vector or a list. Support for matrices, i.e. for multivariate data, is planned but not implemented so far

V

a single integer giving the number of folds. Ordered folds will automatically be created, i.e. fold i will be seq(i, length(Y), folds)

Kmax

a single integer giving maximal number of change-points

adaptiveKmax

a single logical indicating whether Kmax should be chosen adaptively. If true Kmax will be double if the estimated number of change-points is not at least Kmax - tolKmax

tolKmax

a single integer specifiying how much the estimated number of change-points have to be smaller than Kmax

estimator

a function providing a local estimate. For pre-implemented estimators see estimators. The function must have the arguments Y, param and ..., where Y will be a subset of the observations, param will be list(0:Kmax), and ... will be the argument ... of VfoldCV. Note that ... will be passed to estimator and criterion. The return value must be either a list of length length(param) with each entry containing the estimated change-point locations for the given entry in param or a list containing the named entries cps and value. In this case cps has to be a list of the estimated change-points as before and value has to be a list of the locally estimated values for each entry in param, i.e. each list entry has to be a list itself of length one entry longer than the corresponding entry in cps. The function convertSingleParam offers the conversion of an estimator allowing a single parameter into an estimator allowing multiple parameters. From the currently pre-implemented estimators only leastSquares accepts param == list(0:Kmax). Estimators that allow param to differ from list(0:Kmax) can be used in crossvalidationCP

criterion

a function providing the cross-validation criterion. For pre-implemented criteria see criteria. The function must have the arguments testset, estset and value. testset and estset are the observations of one segment that are in the test and estimation set, respectively. value is the local parameter on the segment if provided by estimator, otherwise NULL. Additionally, ... is possible and potentially necessary to absorb arguments, since the argument ... of VfoldCV will be passed to estimator and criterion. It must return a single numeric. All return values will be summed accordingly and which.min will be called on the vector to determine the parameter with the smallest criterion. Hence some NaN values etc. are allowed

output

a string specifying the output, either "param", "fit" or "detailed". For details what they mean see Value

...

additional parameters that are passed to estimator and criterion

Value

if output == "param", the selected number of change-points, i.e. an integer between 0 and Kmax. If output == "fit", a list with the entries param, giving the selected number of change-points, and fit. The named entry fit is a list giving the returned fit obtained by applying estimator to the whole data Y with the selected tuning parameter. The returned value is transformed to a list with an entry cps giving the estimated change-points and, if provided by estimator, an entry value giving the estimated local values. If output == "detailed", the same as for output == "fit", but additionally an entry CP giving all calculated cross-validation criteria. Those values are summed over all folds

References

Pein, F., and Shah, R. D. (2021) Cross-validation for change-point regression: pitfalls and solutions. arXiv:2112.03220.

See Also

estimators, criteria, convertSingleParam

Examples

# call with default parameters:
# 5-fold cross-validation with absolute error loss, least squares estimation,
# and 0 to 5 change-points
VfoldCV(Y = rnorm(100))

# more interesting data and more detailed output
set.seed(1L)
Y <- c(rnorm(50), rnorm(50, 5), rnorm(50), rnorm(50, 5))
VfoldCV(Y = Y, output = "detailed")
# finds the correct change-points at 50, 100, 150
# (plus the start and end points 0 and 200)

# reducing the number of folds to 3
VfoldCV(Y = Y, V = 3L, output = "detailed")

# reducing the maximal number of change-points to 2
VfoldCV(Y = Y, Kmax = 2)

# different criterion: modified error loss
VfoldCV(Y = Y, output = "detailed", criterion = criterionMod)

# manually given criterion; identical to criterionL1loss()
testCriterion <- function(testset, estset, value = NULL, ...) {
  if (!is.null(value)) {
    return(sum(abs(testset - value)))
  }
  
  sum(abs(testset - mean(estset)))
}
identical(VfoldCV(Y = Y, criterion = testCriterion, output = "detailed"),
          VfoldCV(Y = Y, output = "detailed"))

[Package crossvalidationCP version 1.1 Index]