deseats {deseats}R Documentation

Locally Weighted Regression for Trend and Seasonality in Equidistant Time Series under Short Memory

Description

Simultaneously estimate the trend and the seasonality via locally weighted regression in an equidistant time series under short memory. The default setting uses an iterative plug-in algorithm for identifying the asymptotically globally optimal bandwidth for smoothing.

Usage

deseats(
  y,
  smoothing_options = set_options(),
  bwidth_start = NULL,
  inflation_rate = c("optimal", "naive"),
  correction_factor = FALSE,
  autocor = TRUE,
  drop = NULL,
  error_model = c("free", "ARMA"),
  nar_lim = c(0, 3),
  nma_lim = c(0, 3),
  arma_mean = FALSE
)

Arguments

y

a numerical vector or a time series object of class ts or that can be transformed with as.ts to an object of class ts; for these observations, trend and seasonality will be obtained.

smoothing_options

an S4 object of class smoothing_options, which is returned by the function set_options; it includes details about the options to consider in the locally weighted regression such as the order of polynomial and the bandwidth for smoothing among others.

bwidth_start

a single numeric value that is only relevant if the slot bwidth in smoothing_options is set to NA; as the bandwidth will then be selected automatically, bwidth_start sets the initial bandwidth for the algorithm; the default, bwidth_start = NULL, corresponds to bwidth_start = 0.1 for a local linear trend and to bwidth_start = 0.2 for a local cubic trend.

inflation_rate

a character vector of length one that indicates, which inflation rate to use in the bandwidth selection; for a local linear trend, we have inflation_rate = "optimal" as the default, for a local cubic trend it is inflation_rate = "naive", which correspond to inflation rates of 5/7 and 9/13, respectively.

correction_factor

A logical vector of length one; theoretically, a larger bandwidth to estimate the sum of autocovariances from residuals of pilot trend and seasonality estimates is advisable than for estimating trend and seasonality; for correction_factor = TRUE, this is implemented; for error_model = "ARMA", correction_factor = FALSE is enforced; the default is correction_factor = FALSE, because it was found that setting this argument to TRUE often overestimates the bandwidth.

autocor

a logical vector of length one; indicates whether to consider autocorrelated errors (TRUE) or independent but identically distributed errors (FALSE); the default is autocor = TRUE.

drop

a numeric vector of length one that indicates the proportion of the observations to not include at each boundary in the bandwidth estimation process, if a bandwidth is selected automatically; the default is drop = NULL, which corresponds to drop = 0.05 for a local linear trend and to drop = 0.1 for a local cubic trend.

error_model

a character vector of length one that indicates whether for autocor = TRUE the sum of autocovariances of the errors is obtained purely nonparametrically ("free") or whether an autoregressive moving-average (ARMA) model is assumed "ARMA"; the default is error_model = "free".

nar_lim

only valid for error_model = "ARMA"; set the minimum and maximum AR order to check via the BIC in each iteration of the algorithm via a two-element vector.

nma_lim

only valid for error_model = "ARMA"; set the minimum and maximum MA order to check via the BIC in each iteration of the algorithm via a two-element vector.

arma_mean

only valid for error_model = "ARMA"; decide whether to include an estimate of the mean in the ARMA fitting for the detrended series.

Details

Trend and seasonality are estimated based on the additive nonparametric regression model for an equidistant time series

y_t = m(x_t) + s(x_t) + \epsilon_t,

where y_t is the observed time series with t=1,...n, x_t = t / n is the rescaled time on the interval [0, 1], m(x_t) is a smooth and deterministic trend function, s(x_t) is a (slowly changing) seasonal component with seasonal period p_s and \epsilon_t are stationary errors with E(\epsilon_t) = 0 and short-range dependence (see for example also Feng, 2013, for a similar model, where the stochastic term is however i.i.d.).

It is assumed that m and s can be approximated locally by a polynomial of small order and by a trigonometric polynomial, respectively. Through locally weighted regression, m and s can therefore be estimated given a suitable bandwidth.

The iterative-plug-in (IPI) algorithm, which numerically minimizes the asymptotic mean squared error (AMISE) to select a bandwidth is an extension of Feng (2013) to the case with short-range dependence in the errors. To achieve this goal, the error variance in the AMISE in Feng (2013) is replaced by the sum of autocovariances of the error process and this quantity is being estimated using a slightly adjusted version of the Bühlmann (1996) algorithm. This procedure is similar to the method described in Feng, Gries and Fritz (2020), where data-driven local polynomial regression with an automatically selected bandwidth is used to estimate the trend according to a model without seasonality and where the same adjusted Bühlmann (1996) algorithm is considered to estimate the sum of autocovariances in the error process.

Define I[m^{(k)}] = \int_{c_b}^{d_b} [m^{(k)}(x)]^2 dx, \beta_{(k)} = \int_{-1}^{1} u^k K(u) du and R(K) = \int_{-1}^{1} K^{2}(u) du, where p is the order of the (local) polynomial considered for m, k = p + 1 is the order of the asymptotically equivalent kernel K for estimating m, 0 \leq c_{b}< d_{b} \leq 1, and c_f is the variance factor, i.e. the sum of autocovariances divided by 2\pi.

Furthermore, we define

C_{1} = \frac{I[m^{(k)}] \beta_{(k)}^2}{(k!)^2}

and

C_{2} = \frac{2 \pi c_{f} (d_b - c_b)[R(K) + (p_s - 1) R(W)]}{nh}

with h being the bandwidth, n being the number of observations and W being the weighting function considered in the weighted least squares approach, for example a second-order kernel function with support on [-1,1]. The AMISE is then

AMISE(h) = h^{2k}C_{1} + C_{2}.

The function calculates suitable estimates for c_f, the variance factor, and I[m^{(k)}] over different iterations. In each iteration, a bandwidth is obtained in accordance with the AMISE that once more serves as an input for the following iteration. The process repeats until either convergence or the 40th iteration is reached. For further details on the asymptotic theory or the algorithm, please consult Feng, Gries and Fritz (2020) or Feng et al. (2019).

To apply the function, only few arguments are needed: a data input y, an object with smoothing options smoothing_options returned by set_options and a starting value for the relative bandwidth bwidth_start. Aside from y, each argument has a default setting. By default, a local linear trend is considered. In some cases, a local cubic trend may, however, be more suitable. For more specific information on the input arguments consult the section Arguments.

When applying the function, an optimal bandwidth is obtained based on the IPI algorithm proposed by Feng, Gries and Fritz (2020). In a second step, the nonparametric trend of the series and the seasonality are calculated with respect to the chosen bandwidth.

Note that with this function m(x_t) and s(x_t) can be estimated without a parametric model assumption for the error series. Thus, after estimating and removing the trend and the seasonality, any suitable parametric model, e.g. an ARMA(p,q) model for errors = "autocor", can be fitted to the residuals (see arima).

Usually, a local cubic trend (smoothing_options = set_options(order_poly = 3)) gives more suitable results. Moreover, if the resulting bandwidth is too large, adjustments to the arguments inflation_rate and drop should be tried first in that order before any other changes to the input arguments.

The default print method for this function delivers key numbers such as the bandwidth considered for smoothing.

NOTE:

This function implements C++ code by means of the Rcpp and RcppArmadillo packages for better performance.

Value

The function returns and S4 object with the following elements (access them via @):

boundary_method

the applied boundary method.

bwidth

the globally applied bandwidth in the smoothing process; if not if no input is given in the function call, this is the automatically selected optimal bandwidth.

decomp

An object of class "mts" that consists of the decomposed time series data.

frequency

the frequency of the time series.

kernel_fun

the second-order kernel function considered for weighting.

order_poly

the order of polynomial considered locally for the trend.

order_poly

the order of polynomial considered locally for the trend.

sum_autocov

the estimated sum of autocovariances.

ts_name

the object name of the initially provided time series object.

weights_wfun

a matrix that gives the weights of the weighting function K at each estimation time point; ; if n is the length of the given time series and b is the applied (relative) bandwidth, then the first row of the weighting system gives the weighting function weights when estimating at t=1, the second row gives the weights when estimating at t=2 and so on for all left-hand side boundary points until the middle row, which contains the weights used at all interior points; the rows following the middle row contain the weights for right-hand side boundary points (the rows are ordered chronologically)

weights

an array with many slices that represent the weighting systems for various filters; each slice is a matrix, which gives the weighting system to estimate a component, for example trend + seasonality, as a weighted average from the given time series; if n is the length of the given time series and b is the applied (relative) bandwidth, then the first row of the weighting system gives the weights to obtain estimates at t=1, the second row gives the weights to obtain estimates at t=2 and so on for all left-hand side boundary points until the middle row, which contains the weights used at all interior points; the rows following the middle row contain the weights for right-hand side boundary points (the rows are ordered chronologically); the slice names are "Trend", "Season" and "Combined", where "Combined" are the weights to estimate trend + seasonality combined.

Author(s)

References

Examples


Xt <- log(EXPENDITURES)
smoothing_options <- set_options(order_poly = 3)
est <- deseats(Xt, smoothing_options = smoothing_options)
est
plot(est, which = 1)



[Package deseats version 1.1.0 Index]