R: KNN Forecast Bootstrap Prediction Intervals

knn.forecast.boot.intervals {knnwtsim}

R Documentation

KNN Forecast Bootstrap Prediction Intervals

Description

A function for forecasting using KNN regression with prediction intervals. The approach is based on the description of "Prediction intervals from bootstrapped residuals" from chapter 5.5 of Hyndman R, Athanasopoulos G (2021) https://otexts.com/fpp3/prediction-intervals.html#prediction-intervals-from-bootstrapped-residuals, modified as needed for use with KNN regression. The algorithm starts by calculating a pool of forecast errors to later sample from. If there are n points prior to the first observation indicated in f.index.in then there will be n - k.in errors generated by one-step ahead forecasts starting with the point of the response series at the index k.in + 1. The first k.in points cannot be estimated because a minimum of k.in eligible neighbors would be needed. The optional burn.in argument can be used to increase the number of points from the start of the series that need to be available as neighbors before calculating errors for the pool. Next, B possible paths the series could take are simulated using the pool of errors. Each path is simulated by calling knn.forecast(), estimating the first point in f.index.in, adding a sampled forecast error, then adding this value to the end of the series. This process is then repeated for the next point in f.index.in until all have been estimated. The final output interval estimates are calculated for each point in f.index.in by taking the appropriate percentiles of the corresponding simulations of that point. The mean and medians are also calculated from these simulations. One important implication of this behavior is that the mean forecast output from this function can differ from the point forecast produced by knn.forecast() alone.

Usage

knn.forecast.boot.intervals(
  Sim.Mat.in,
  f.index.in,
  k.in,
  y.in,
  burn.in = NULL,
  B = 200,
  return.simulations = FALSE,
  level = 0.95
)

Arguments

`Sim.Mat.in`	numeric and symmetric matrix of similarities (recommend use of `S_w`, see `SwMatrixCalc()`).
`f.index.in`	numeric vector indicating the indices of `Sim.Mat.in` and `y.in` which correspond to the time order of the points to be forecast.
`k.in`	integer value indicating the the number of nearest neighbors to be considered in forecasting, must be `>= 1`.
`y.in`	numeric vector of the response series to be forecast.
`burn.in`	integer value which indicates how many points at the start of the series to set aside as eligible neighbors before calculating forecast errors to be re-sampled.
`B`	integer value representing the number of bootstrap replications, this will be the number of forecasts simulated and used to calculate outputs, must be `>= 1`.
`return.simulations`	logical value indicating whether to return all simulated forecasts.
`level`	numeric value over the range (0,1) indicating the confidence level for the prediction intervals.

Value

list of the following components:

lb: numeric vector of the same length as f.index.in, with the estimated lower bound of the prediction interval.
ub: numeric vector of the same length as f.index.in, with the estimated upper bound of the prediction interval.
mean: numeric vector of the same length as f.index.in, with the mean of the B simulated paths for each forecasted point.
median: numeric vector of the same length as f.index.in, with the median of the B simulated paths for each forecasted point.
simulated.paths: numeric matrix where each of the B rows contains a simulated path for the points in f.index.in, only returned if return.simulations = TRUE.

Examples

data("simulation_master_list")
series.index <- 15
ex.series <- simulation_master_list[[series.index]]$series.lin.coef.chng.x

# Weights pre tuned by random search. In alpha, beta, gamma order
pre.tuned.wts <- c(0.2148058, 0.2899638, 0.4952303)
pre.tuned.k <- 5

df <- data.frame(ex.series)
# Generate vector of time orders
df$t <- c(1:nrow(df))

# Generate vector of periods
nperiods <- simulation_master_list[[series.index]]$seasonal.periods
df$p <- rep(1:nperiods, length.out = nrow(df))

# Pull corresponding exogenous predictor(s)
X <- as.matrix(simulation_master_list[[series.index]]$x.chng)


# Calculate the weighted similarity matrix using Sw
Sw.ex <- SwMatrixCalc(
  t.in = df$t,
  p.in = df$p, nPeriods.in = nperiods,
  X.in = X,
  weights = pre.tuned.wts
)

n <- length(ex.series)
# Index we want to forecast
f.index <- c((n - 5 + 1):length(ex.series))

interval.forecast <- knn.forecast.boot.intervals(
  Sim.Mat.in = Sw.ex,
  f.index.in = f.index,
  y.in = ex.series,
  k.in = pre.tuned.k
)