rollCast {smoots} | R Documentation |
Backtesting Semi-ARMA Models with Rolling Forecasts
Description
A simple backtest of Semi-ARMA models via rolling forecasts can be implemented.
Usage
rollCast(
y,
p = NULL,
q = NULL,
K = 5,
method = c("norm", "boot"),
alpha = 0.95,
np.fcast = c("lin", "const"),
it = 10000,
n.start = 1000,
pb = TRUE,
cores = future::availableCores(),
argsSmoots = list(),
plot = TRUE,
argsPlot = list()
)
Arguments
y |
a numeric vector that represents the equidistant time series assumed to follow a Semi-ARMA model; must be ordered from past to present. |
p |
an integer value |
q |
an integer value |
K |
a single, positive integer value that defines the number of
out-of-sample observations; the last |
method |
a character object; defines the method used for the calculation
of the forecasting intervals; with |
alpha |
a numeric vector of length 1 with |
np.fcast |
a character object; defines the forecasting method used
for the nonparametric trend; for |
it |
an integer that represents the total number of iterations, i.e.,
the number of simulated series; is set to |
n.start |
an integer that defines the 'burn-in' number
of observations for the simulated ARMA series via bootstrap; is set to
|
pb |
a logical value; for |
cores |
an integer value >0 that states the number of (logical) cores to
use in the bootstrap (or |
argsSmoots |
a list that contains arguments that will be passed to
|
plot |
a logical value that controls the graphical output; for the
default ( |
argsPlot |
a list; additional arguments for the standard plot function,
e.g., |
Details
Define that an observed, equidistant time series y_t
, with
t = 1, 2, ..., n
, follows
y_t = m(x_t) + \epsilon_t,
where x_t = t/n
is the rescaled time on the closed
interval [0,1]
and m(x_t)
is a nonparametric and
deterministic trend function (see Beran and Feng, 2002, and Feng, Gries and
Fritz, 2020).
\epsilon_t
, on the other hand, is a stationary process
with E(\epsilon_t) = 0
and short-range dependence.
For the purpose of this function, \epsilon_t
is assumed
to follow an autoregressive-moving-average (ARMA) model with
\epsilon_t = \zeta_t + \beta_1 \epsilon_{t-1} + ... + \beta_p
\epsilon_{t-p} + \alpha_1 \zeta_{t-1} + ... +
\alpha_q \zeta_{t-q}.
Here, the random variables \zeta_t
are identically and
independently distributed (i.i.d.) with zero-mean and a constant variance
and the coefficients \alpha_j
and \beta_i
,
i = 1, 2, ..., p
and j = 1, 2, ..., q
, are real numbers.
The combination of both previous formulas will be called a semiparametric
ARMA (Semi-ARMA) model.
An explicit forecasting method of Semi-ARMA models is described in
modelCast
. To backtest a selected model, a slightly adjusted
procedure is used. The data is divided into in-sample and an
out-of-sample values (usually the last K = 5
observations in the data
are reserved for the out-of-sample observations). A model is fitted to the
in-sample data, whereas one-step rolling point forecasts and forecasting
intervals are obtained for the out-of-sample time points. The proposed
forecasts of the trend are either a linear or a constant extrapolation of
the trend with negligible forecasting intervals, whereas the point forecasts
of the stationary rest term are obtained via the selected ARMA(p,q
)
model (see Fritz et al., 2020). The corresponding forecasting intervals
are calculated under the assumption that the innovations
\zeta_t
are either normally distributed (see e.g. pp.
93-94 in Brockwell and Davis, 2016) or via a forward bootstrap (see Lu and
Wang, 2020). For a one-step forecast for time point t
, all observations
until time point t-1
are assumed to be known.
The function calculates three important values for backtesting: the number
of breaches, i.e. the number of true observations that lie outside of the
forecasting intervals, the mean absolute scaled error (MASE, see Hyndman
and Koehler, 2006) and the root mean squared scaled error (RMSSE, see
Hyndman and Koehler, 2006) are obtained. For the MASE, a value < 1
indicates a better average forecasting potential than a naive forecasting
approach.
Furthermore, it is independent from the scale of the data and can thus be
used to compare forecasts of different datasets. Closely related is the
RMSSE, however here, the mean of the squared forecasting errors is computed
and scaled by the mean of the squared naive forecasting approach. Then the
root of that value is the RMSSE. Due to the close relation, the
interpretation of the RMSSE is similarly but not identically to the
interpretation of the MASE. Of course, a value close to zero is preferred
in both cases.
To make use of the function, a numeric vector with the values of a time
series that is assumed to follow a Semi-ARMA model needs to be passed to
the argument y
. Moreover, the arguments p
and q
represent the AR and MA orders, respectively, of the underlying ARMA
process in the parametric part of the model. If both values are set to
NULL
, an optimal order in accordance with the Bayesian Information
Criterion (BIC) will be selected. If only one of the values is NULL
,
it will be changed to zero instead. K
defines the number of the
out-of-sample observations; these will be cut off the end of y
, while
the remaining observations are treated as the in-sample observations. For the
K
out-of-sample time points, rolling forecasts will be obtained.
method
describes the method to use for the computation of the
prediction intervals. Under the normality assumption for the innovations
\zeta_t
, intervals can be obtained via
method = "norm". However, if the assumption does not hold, a
bootstrap can be implemented as well (method = "boot"). Both
approaches are explained in more detail in normCast
and
bootCast
, respectively. With alpha
, the confidence
level of the forecasting intervals can be adjusted, as the
(100
alpha
)-percent forecasting intervals will be computed. By
means of the argument np.fcast
, the forecasting method for the
nonparametric trend function can be defined. Selectable are a linear
(np.fcast = "lin"
) and a constant (np.fcast = "const"
)
extrapolation. For more information on these methods, we refer the reader to
trendCast
.
it
, n.start
, pb
and cores
are only
relevant for method = "boot"
. With it
the total number of
bootstrap iterations is defined, whereas n.start
regulates, how
many 'burn-in' observations are generated for each simulated ARMA process
in the bootstrap. Since a bootstrap may take a longer computation time,
with the argument cores
the number of cores for parallel computation
of the bootstrap iterations can be defined. Nonetheless, for
cores = NULL
, no cluster is created and therefore
the parallel computation is disabled. Note that the bootstrapped results are
fully reproducible for all cluster sizes. Moreover, for pb = TRUE
,
the progress of the bootstrap approach can be observed in the R console via
a progress bar. Additional information on these four function arguments can
be found in bootCast
.
The argument argsSmoots
is a list. In this list, different arguments
of the function msmooth
can be implemented to adjust the
estimation of the nonparametric part of the complete model. The arguments
of the smoothing function are described in msmooth
.
rollCast
allows for a quick plot of the results. If the logical
argument plot
is set to TRUE
, a graphic with default
settings is created. Nevertheless, users are allowed to implement further
arguments of the standard plot function in the list argsPlot
. For
example, the limits of the plot can be adjusted by xlim
and
ylim
. Furthermore, an argument x
can be included in
argsPlot
with the actual equidistant time points of the whole series
(in-sample & out-of-sample observations). Otherwise, simply 1:n
is
used as the in-sample time points by default.
NOTE:
Within this function, the arima
function of the
stats
package with its method "CSS-ML"
is used throughout for
the estimation of ARMA models. Furthermore, to increase the performance,
C++ code via the Rcpp
and
RcppArmadillo
packages
was implemented. Also, the future
and
future.apply
packages are
considered for parallel computation of bootstrap iterations. The progress
of the bootstrap is shown via the
progressr
package.
Value
A list with different elements is returned. The elements are as follows.
- alpha
a single numeric value; it describes, what confidence level (
100
alpha
)-percent has been considered for the forecasting intervals.- breach
a logical vector that states whether the
K
true out-of-sample observations lie outside of the forecasting intervals, respectively; a breach is denoted byTRUE
.- breach.val
a numeric vector that contains the margin of the breaches (in absolute terms) for the
K
out-of-sample time points; if a breach did not occur, the respective element is set to zero.- error
a numeric vector that contains the simulated empirical values of the forecasting error for
method = "boot"
; otherwise, it is set toNULL
.- fcast.rest
a numeric vector that contains the
K
point forecasts of the parametric part of the model.- fcast.roll
a numeric matrix that contains the
K
rolling point forecasts as well as the values of the bounds of the respective forecasting intervals for the complete model; the first row contains the point forecasts, the lower bounds of the forecasting intervals are in the second row and the upper bounds can be found in the third row.- fcast.trend
a numeric vector that contains the
K
obtained trend forecasts.- K
a positive integer; states the number of out-of-sample observations as well as the number of forecasts for the out-of-sample time points.
- MASE
the obtained value of the mean average scaled error for the selected model.
- method
a character object that states, whether the forecasting intervals were obtained via a bootstrap (
method = "boot"
) or under the normality assumption for the innovations (method = "norm"
).- model.nonpar
the output (usually a list) of the nonparametric trend estimation via
msmooth
.- model.par
the output (usually a list) of the parametric ARMA estimation of the detrended series via
arima
.- n
the number of observations (in-sample & out-of-sample observations).
- n.in
the number of in-sample observations (
n - n.out
).- n.out
the number of out-of-sample observations (equals
K
).- np.fcast
a character object that states the applied forecasting method for the nonparametric trend function; either a linear (
np.fcast = "lin"
) or a constantnp.fcast = "const"
are possible.- quants
a numeric vector of length 2 with the
[100(1 -
alpha
)/2]
-percent and {100
[1 - (1 -
alpha
)/2]
}-percent quantiles of the forecasting error distribution.- RMSSE
the obtained value of the root mean squared scaled error for the selected model.
- y
a numeric vector that contains all true observations (in-sample & out-of-sample observations).
- y.in
a numeric vector that contains all in-sample observations.
- y.out
a numeric vector that contains the
K
out-of-sample observations.
Author(s)
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/Dominik Schulz (Research Assistant) (Department of Economics, Paderborn University),
Package Creator and Maintainer
References
Beran, J., and Feng, Y. (2002). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54, 291-311.
Brockwell, P. J., and Davis, R. A. (2016). Introduction to time series and forecasting, 3rd edition. Springer.
Fritz, M., Forstinger, S., Feng, Y., and Gries, T. (2020). Forecasting economic growth processes for developing economies. Unpublished.
Feng, Y., Gries, T. and Fritz, M. (2020). Data-driven local polynomial for the trend and its derivatives in economic time series. Journal of Nonparametric Statistics, 32:2, 510-533.
Hyndman, R. J., and Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22:4, 679-688.
Lu, X., and Wang, L. (2020). Bootstrap prediction interval for ARMA models with unknown orders. REVSTAT–Statistical Journal, 18:3, 375-396.
Examples
lgdp <- log(smoots::gdpUS$GDP)
time <- seq(from = 1947.25, to = 2019.5, by = 0.25)
backtest <- rollCast(lgdp, K = 5,
argsPlot = list(x = time, xlim = c(2012, 2019.5), col = "forestgreen",
type = "b", pch = 20, lty = 2, main = "Example"))
backtest