R: Backtesting Semi-ARMA Models with Rolling Forecasts

rollCast {smoots}

R Documentation

Backtesting Semi-ARMA Models with Rolling Forecasts

Description

A simple backtest of Semi-ARMA models via rolling forecasts can be implemented.

Usage

rollCast(
  y,
  p = NULL,
  q = NULL,
  K = 5,
  method = c("norm", "boot"),
  alpha = 0.95,
  np.fcast = c("lin", "const"),
  it = 10000,
  n.start = 1000,
  pb = TRUE,
  cores = future::availableCores(),
  argsSmoots = list(),
  plot = TRUE,
  argsPlot = list()
)

Arguments

`y`	a numeric vector that represents the equidistant time series assumed to follow a Semi-ARMA model; must be ordered from past to present.
`p`	an integer value `\geq 0` that defines the AR order `p` of the underlying ARMA(`p,q`) model within `X`; is set to `NULL` by default; if no value is passed to `p` but one is passed to `q`, `p` is set to `0`; if both `p` and `q` are `NULL`, optimal orders following the BIC for `0 \leq p,q \leq 5` are chosen; is set to `NULL` by default; decimal numbers will be rounded off to integers.
`q`	an integer value `\geq 0` that defines the MA order `q` of the underlying ARMA(`p,q`) model within `X`; is set to `NULL` by default; if no value is passed to `q` but one is passed to `p`, `q` is set to `0`; if both `p` and `q` are `NULL`, optimal orders following the BIC for `0 \leq p,q \leq 5` are chosen; is set to `NULL` by default; decimal numbers will be rounded off to integers.
`K`	a single, positive integer value that defines the number of out-of-sample observations; the last `K` observations in `y` are treated as the out-of-sample observations, whereas the rest of the observations in `y` are the in-sample values.
`method`	a character object; defines the method used for the calculation of the forecasting intervals; with `"norm"` the intervals are obtained under the assumption of normally distributed innovations; with `"boot"` the intervals are obtained via a bootstrap; is set to `"norm"` by default.
`alpha`	a numeric vector of length 1 with `0 <` `alpha` `< 1`; the forecasting intervals will be obtained based on the confidence level (`100alpha`)-percent; is set to `alpha = 0.95` by default, i.e., a `95`-percent confidence level.
`np.fcast`	a character object; defines the forecasting method used for the nonparametric trend; for `np.fcast = "lin"` the trend is is extrapolated linearly based on the last two trend estimates; for `np.fcast = "const"`, the last trend estimate is used as a constant estimate for future values; is set to "lin" by default.
`it`	an integer that represents the total number of iterations, i.e., the number of simulated series; is set to `10000` by default; only necessary, if `method = "boot"`; decimal numbers will be rounded off to integers.
`n.start`	an integer that defines the 'burn-in' number of observations for the simulated ARMA series via bootstrap; is set to `1000` by default; only necessary, if `method = "boot"`;decimal numbers will be rounded off to integers.
`pb`	a logical value; for `pb = TRUE`, a progress bar will be shown in the console, if `method = "boot"`.
`cores`	an integer value >0 that states the number of (logical) cores to use in the bootstrap (or `NULL`); the default is the maximum number of available cores (via `future::availableCores`); for `cores = NULL`, parallel computation is disabled.
`argsSmoots`	a list that contains arguments that will be passed to `msmooth` for the estimation of the nonparametric trend function; by default, the default values of `msmooth` are used.
`plot`	a logical value that controls the graphical output; for the default (`plot = TRUE`), the original series with the obtained point forecasts as well as the forecasting intervals will be plotted; for `plot = FALSE`, no plot will be created.
`argsPlot`	a list; additional arguments for the standard plot function, e.g., `xlim`, `type`, ..., can be passed to it; arguments with respect to plotted graphs, e.g., the argument `col`, only affect the original series `y`; please note that in accordance with the argument `x` (lower case) of the standard plot function, an additional numeric vector with time points can be implemented via the argument `x` (lower case).

Details

Define that an observed, equidistant time series y_t, with t = 1, 2, ..., n, follows

y_t = m(x_t) + \epsilon_t,

where x_t = t/n is the rescaled time on the closed interval [0,1] and m(x_t) is a nonparametric and deterministic trend function (see Beran and Feng, 2002, and Feng, Gries and Fritz, 2020). \epsilon_t, on the other hand, is a stationary process with E(\epsilon_t) = 0 and short-range dependence. For the purpose of this function, \epsilon_t is assumed to follow an autoregressive-moving-average (ARMA) model with

\epsilon_t = \zeta_t + \beta_1 \epsilon_{t-1} + ... + \beta_p \epsilon_{t-p} + \alpha_1 \zeta_{t-1} + ... + \alpha_q \zeta_{t-q}.

Here, the random variables \zeta_t are identically and independently distributed (i.i.d.) with zero-mean and a constant variance and the coefficients \alpha_j and \beta_i, i = 1, 2, ..., p and j = 1, 2, ..., q, are real numbers. The combination of both previous formulas will be called a semiparametric ARMA (Semi-ARMA) model.

An explicit forecasting method of Semi-ARMA models is described in modelCast. To backtest a selected model, a slightly adjusted procedure is used. The data is divided into in-sample and an out-of-sample values (usually the last K = 5 observations in the data are reserved for the out-of-sample observations). A model is fitted to the in-sample data, whereas one-step rolling point forecasts and forecasting intervals are obtained for the out-of-sample time points. The proposed forecasts of the trend are either a linear or a constant extrapolation of the trend with negligible forecasting intervals, whereas the point forecasts of the stationary rest term are obtained via the selected ARMA(p,q) model (see Fritz et al., 2020). The corresponding forecasting intervals are calculated under the assumption that the innovations \zeta_t are either normally distributed (see e.g. pp. 93-94 in Brockwell and Davis, 2016) or via a forward bootstrap (see Lu and Wang, 2020). For a one-step forecast for time point t, all observations until time point t-1 are assumed to be known.

The function calculates three important values for backtesting: the number of breaches, i.e. the number of true observations that lie outside of the forecasting intervals, the mean absolute scaled error (MASE, see Hyndman and Koehler, 2006) and the root mean squared scaled error (RMSSE, see Hyndman and Koehler, 2006) are obtained. For the MASE, a value < 1 indicates a better average forecasting potential than a naive forecasting approach. Furthermore, it is independent from the scale of the data and can thus be used to compare forecasts of different datasets. Closely related is the RMSSE, however here, the mean of the squared forecasting errors is computed and scaled by the mean of the squared naive forecasting approach. Then the root of that value is the RMSSE. Due to the close relation, the interpretation of the RMSSE is similarly but not identically to the interpretation of the MASE. Of course, a value close to zero is preferred in both cases.

To make use of the function, a numeric vector with the values of a time series that is assumed to follow a Semi-ARMA model needs to be passed to the argument y. Moreover, the arguments p and q represent the AR and MA orders, respectively, of the underlying ARMA process in the parametric part of the model. If both values are set to NULL, an optimal order in accordance with the Bayesian Information Criterion (BIC) will be selected. If only one of the values is NULL, it will be changed to zero instead. K defines the number of the out-of-sample observations; these will be cut off the end of y, while the remaining observations are treated as the in-sample observations. For the K out-of-sample time points, rolling forecasts will be obtained. method describes the method to use for the computation of the prediction intervals. Under the normality assumption for the innovations \zeta_t, intervals can be obtained via method = "norm". However, if the assumption does not hold, a bootstrap can be implemented as well (method = "boot"). Both approaches are explained in more detail in normCast and bootCast, respectively. With alpha, the confidence level of the forecasting intervals can be adjusted, as the (100alpha)-percent forecasting intervals will be computed. By means of the argument np.fcast, the forecasting method for the nonparametric trend function can be defined. Selectable are a linear (np.fcast = "lin") and a constant (np.fcast = "const") extrapolation. For more information on these methods, we refer the reader to trendCast.

it, n.start, pb and cores are only relevant for method = "boot". With it the total number of bootstrap iterations is defined, whereas n.start regulates, how many 'burn-in' observations are generated for each simulated ARMA process in the bootstrap. Since a bootstrap may take a longer computation time, with the argument cores the number of cores for parallel computation of the bootstrap iterations can be defined. Nonetheless, for cores = NULL, no cluster is created and therefore the parallel computation is disabled. Note that the bootstrapped results are fully reproducible for all cluster sizes. Moreover, for pb = TRUE, the progress of the bootstrap approach can be observed in the R console via a progress bar. Additional information on these four function arguments can be found in bootCast.

The argument argsSmoots is a list. In this list, different arguments of the function msmooth can be implemented to adjust the estimation of the nonparametric part of the complete model. The arguments of the smoothing function are described in msmooth.

rollCast allows for a quick plot of the results. If the logical argument plot is set to TRUE, a graphic with default settings is created. Nevertheless, users are allowed to implement further arguments of the standard plot function in the list argsPlot. For example, the limits of the plot can be adjusted by xlim and ylim. Furthermore, an argument x can be included in argsPlot with the actual equidistant time points of the whole series (in-sample & out-of-sample observations). Otherwise, simply 1:n is used as the in-sample time points by default.

NOTE:

Within this function, the arima function of the stats package with its method "CSS-ML" is used throughout for the estimation of ARMA models. Furthermore, to increase the performance, C++ code via the Rcpp and RcppArmadillo packages was implemented. Also, the future and future.apply packages are considered for parallel computation of bootstrap iterations. The progress of the bootstrap is shown via the progressr package.

Value

A list with different elements is returned. The elements are as follows.

alpha: a single numeric value; it describes, what confidence level (100alpha)-percent has been considered for the forecasting intervals.
breach: a logical vector that states whether the K true out-of-sample observations lie outside of the forecasting intervals, respectively; a breach is denoted by TRUE.
breach.val: a numeric vector that contains the margin of the breaches (in absolute terms) for the K out-of-sample time points; if a breach did not occur, the respective element is set to zero.
error: a numeric vector that contains the simulated empirical values of the forecasting error for method = "boot"; otherwise, it is set to NULL.
fcast.rest: a numeric vector that contains the K point forecasts of the parametric part of the model.
fcast.roll: a numeric matrix that contains the K rolling point forecasts as well as the values of the bounds of the respective forecasting intervals for the complete model; the first row contains the point forecasts, the lower bounds of the forecasting intervals are in the second row and the upper bounds can be found in the third row.
fcast.trend: a numeric vector that contains the K obtained trend forecasts.
K: a positive integer; states the number of out-of-sample observations as well as the number of forecasts for the out-of-sample time points.
MASE: the obtained value of the mean average scaled error for the selected model.
method: a character object that states, whether the forecasting intervals were obtained via a bootstrap (method = "boot") or under the normality assumption for the innovations (method = "norm").
model.nonpar: the output (usually a list) of the nonparametric trend estimation via msmooth.
model.par: the output (usually a list) of the parametric ARMA estimation of the detrended series via arima.
n: the number of observations (in-sample & out-of-sample observations).
n.in: the number of in-sample observations (n - n.out).
n.out: the number of out-of-sample observations (equals K).
np.fcast: a character object that states the applied forecasting method for the nonparametric trend function; either a linear ( np.fcast = "lin") or a constant np.fcast = "const" are possible.
quants: a numeric vector of length 2 with the [100(1 - alpha)/2]-percent and {100[1 - (1 - alpha)/2]}-percent quantiles of the forecasting error distribution.
RMSSE: the obtained value of the root mean squared scaled error for the selected model.
y: a numeric vector that contains all true observations (in-sample & out-of-sample observations).
y.in: a numeric vector that contains all in-sample observations.
y.out: a numeric vector that contains the K out-of-sample observations.

Author(s)

Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/
Dominik Schulz (Research Assistant) (Department of Economics, Paderborn University),
Package Creator and Maintainer

References

Beran, J., and Feng, Y. (2002). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54, 291-311.

Brockwell, P. J., and Davis, R. A. (2016). Introduction to time series and forecasting, 3rd edition. Springer.

Fritz, M., Forstinger, S., Feng, Y., and Gries, T. (2020). Forecasting economic growth processes for developing economies. Unpublished.

Feng, Y., Gries, T. and Fritz, M. (2020). Data-driven local polynomial for the trend and its derivatives in economic time series. Journal of Nonparametric Statistics, 32:2, 510-533.

Hyndman, R. J., and Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22:4, 679-688.

Lu, X., and Wang, L. (2020). Bootstrap prediction interval for ARMA models with unknown orders. REVSTAT–Statistical Journal, 18:3, 375-396.

Examples

lgdp <- log(smoots::gdpUS$GDP)
time <- seq(from = 1947.25, to = 2019.5, by = 0.25)
backtest <- rollCast(lgdp, K = 5,
 argsPlot = list(x = time, xlim = c(2012, 2019.5), col = "forestgreen",
 type = "b", pch = 20, lty = 2, main = "Example"))
backtest

[Package smoots version 1.1.4 Index]