R: Advanced Data-driven Nonparametric Regression for the Trend...

tsmoothlm {esemifar}

R Documentation

Advanced Data-driven Nonparametric Regression for the Trend in Equidistant Time Series

Description

This function runs an iterative plug-in algorithm to find the optimal bandwidth for the estimation of the nonparametric trend in equidistant time series (with long-memory errors) and then employs the resulting bandwidth via either local polynomial or kernel regression.

Usage

tsmoothlm(
  y,
  pmin = c(0, 1, 2, 3, 4, 5),
  pmax = c(0, 1, 2, 3, 4, 5),
  qmin = c(0, 1, 2, 3, 4, 5),
  qmax = c(0, 1, 2, 3, 4, 5),
  p = c(1, 3),
  mu = c(0, 1, 2, 3),
  InfR = c("Opt", "Nai", "Var"),
  bStart = 0.15,
  bb = c(0, 1),
  cb = 0.05,
  method = c("lpr", "kr")
)

Arguments

y

a numeric vector that contains the time series ordered from past to present.

pmin

an integer value >= 0 that defines the minimum autoregressive order to calculate the BIC-criterion for; is set to 0 by default; decimal numbers will be rounded off to integers.

pmax

an integer value >= 0 that defines the maximum autoregressive order to calculate the BIC-criterion for; is set to 0 by default; decimal numbers will be rounded off to integers.

qmin

an integer value >= 0 that defines the minimum moving-average order to calculate the BIC-criterion for; is set to 0 by default; decimal numbers will be rounded off to integers.

qmax

an integer value >= 0 that defines the maximum moving-average order to calculate the BIC-criterion for; is set to 0 by default; decimal numbers will be rounded off to integers.

p

an integer 1 (local linear regression) or 3 (local cubic regression); represents the order of polynomial within the local polynomial regression (see also the 'Details' section); is set to 1 by default; is automatically set to 1 if method = "kr".

mu

an integer 0, ..., 3 that represents the smoothness parameter of the kernel weighting function and thus defines the kernel function that will be used within the local polynomial regression; is set to 1 by default.

Number	Kernel
`0`	Uniform Kernel
`1`	Epanechnikov Kernel
`2`	Bisquare Kernel
`3`	Triweight Kernel

InfR

a character object that represents the inflation rate in the form h_d = h^a for the bandwidth in the estimation of I[m^{(k)}] (see also the 'Details' section); is set to "Opt" by default.

Inflation rate	Description
`"Opt"`	Optimal inflation rate `a_{p,O}` (`(5-2d)/(7-2d)` for `p = 1`; `(9-2d)/(11-2d)` for `p = 3`)
`"Nai"`	Naive inflation rate `a_{p,N}` (`(5-2d)/(9-2d)` for `p = 1`; `(9-2d)/(13-2d)` for `p = 3`)
`"Var"`	Stable inflation rate `a_{p,S}` (`1/2` for `p = 1` and `p = 3`)

bStart

a numeric object that indicates the starting value of the bandwidth for the iterative process; should be > 0; is set to 0.15 by default.

bb

can be set to 0 or 1; the parameter controlling the bandwidth used at the boundary; is set to 1 by default.

Number (`bb`)	Estimation procedure at boundary points
`0`	Fixed bandwidth on one side with possible large bandwidth on the other side at the boundary
`1`	The `k`-nearest neighbor method will be used

cb

a numeric value that indicates the percentage of omitted observations on each side of the observation period for the automated bandwidth selection; is set to 0.05 by default.

method

the final smoothing approach; "lpr" represents the local polynomial regression, whereas "kr" implements a kernel regression; is set to "lpr" by default.

Details

The trend is estimated based on the additive nonparametric regression model for an equidistant time series

y_t = m(x_t) + \epsilon_t,

where y_t is the observed time series, x_t is the rescaled time on the interval [0, 1], m(x_t) is a smooth and deterministic trend function and \epsilon_t are stationary errors with E(\epsilon_t) = 0 and is assumed to follow a FARIMA(p, d, q) model (see also Beran and Feng, 2002a, Beran and Feng, 2002b and Beran and Feng, 2002c).

The iterative-plug-in (IPI) algorithm, which numerically minimizes the Asymptotic Mean Squared Error (AMISE), is based on the proposal of Beran and Feng (2002a).

The function calculates suitable estimates for c_f, the variance factor, and I[m^{(k)}] over different iterations. In each iteration, a bandwidth is obtained in accordance with the AMISE that once more serves as an input for the following iteration. The process repeats until either convergence or the 40th iteration is reached. For further details on the asymptotic theory or the algorithm, please see Letmathe et al., 2023.

To apply the function, the following arguments are needed: a data input y, an order of polynomial p, a kernel weighting function defined by the smoothness parameter mu, an inflation rate setting InfR (see also Beran and Feng, 2002b), a starting value for the relative bandwidth bStart, a boundary method bb, a boundary cut-off percentage cb and a final smoothing method method. In fact, aside from the input vector y, every argument has a default setting that can be adjusted for the individual case. Theoretically, the initial bandwidth does not affect the selected optimal bandwidth. However, in practice local minima of the AMISE might exist and influence the selected bandwidth. Therefore, the default setting is bStart = 0.15. In the rare case of a clearly unsuitable optimal bandwidth, a starting bandwidth that differs from the default value is a first possible approach to obtain a better result. Other argument adjustments can be tried as well. For more specific information on the input arguments consult the section Arguments.

When applying the function, an optimal bandwidth is obtained based on a strongly modified version of the IPI algorithm of Beran and Feng (2002a). In a second step, the nonparametric trend of the series is calculated with respect to the chosen bandwidth and the selected regression method (lpf or kr). Please note that method = "lpf" is strongly recommended by the authors. Moreover, it is notable that p is automatically set to 1 for method = "kr". The output object is then a list that contains, among other components, the original time series, the estimated trend values and the series without the trend.

The default print method for this function delivers only key numbers such as the iteration steps and the generated optimal bandwidth rounded to the fourth decimal. The exact numbers and results such as the estimated nonparametric trend series are saved within the output object and can be addressed via the $ sign.

Value

The function returns a list with different components:

FARIMA.BIC: the Bayesian Information Criterion of the optimal FARIMA(p,d,q) model.
cb: the percentage of omitted observations on each side of the observation period.
b0: the optimal bandwidth chosen by the IPI-algorithm.
bb: the boundary bandwidth method used within the IPI; always equal to 1.
bStart: the starting value of the (relative) bandwidth; input argument.
cf0: the estimated variance factor; in contrast to the definitions given in the Details section, this object actually contains an estimated value of 2\pi c_f, i.e. it corresponds to the estimated sum of autocovariances.
d.BIC: the long-memory parameter of the optimal FARIMA(p,d,q) model.
FARMA.BIC: the model fit of the selected FARIMA(p,d,q model.
I2: the estimated value of I[m^{(k)}].
InfR: the setting for the inflation rate according to the chosen algorithm.
iterations: the bandwidths of the single iterations steps
mu: the smoothness parameter of the second order kernel; input argument.
n: the number of observations.
niterations: the total number of iterations until convergence.
orig: the original input series; input argument.
p.BIC: the order p of the optimal FARIMA(p,d,q) model.
p: the order of polynomial used in the IPI-algorithm; also used for the final smoothing, if method = "lpr"; input argument.
q.BIC: the order q of the optimal FARIMA(p,d,q) model.
res: the estimated residual series.
v: the considered order of derivative of the trend; is always zero for this function.
ws: the weighting system matrix used within the local polynomial regression; this matrix is a condensed version of a complete weighting system matrix; in each row of ws, the weights for conducting the smoothing procedure at a specific observation time point can be found; the first [nb + 0.5] rows, where n corresponds to the number of observations, b is the bandwidth considered for smoothing and [.] denotes the integer part, contain the weights at the [nb + 0.5] left-hand boundary points; the weights in row [nb + 0.5] + 1 are representative for the estimation at all interior points and the remaining rows contain the weights for the right-hand boundary points; each row has exactly 2[nb + 0.5] + 1 elements, more specifically the weights for observations of the nearest 2[nb + 0.5] + 1 time points; moreover, the weights are normalized, i.e. the weights are obtained under consideration of the time points x_t = t/n, where t = 1, 2, ..., n.
ye: the nonparametric estimates of the trend.

Author(s)

Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/
Sebastian Letmathe (Scientific Employee) (Department of Economics, Paderborn University),
Package Creator and Maintainer
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author

References

Beran, J. and Y. Feng (2002a). Iterative plug-in algorithms for SEMIFAR models - definition, convergence, and asymptotic properties. Journal of Computational and Graphical Statistics 11(3), 690-713.

Beran, J. and Feng, Y. (2002b). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54(2), 291-311.

Beran, J. and Feng, Y. (2002c). SEMIFAR models - a semiparametric approach to modelling trends, longrange dependence and nonstationarity. Computational Statistics & Data Analysis 40(2), 393-419.

Letmathe, S., Beran, J. and Feng, Y. (2023). An extended exponential SEMIFAR model with application in R. Communications in Statistics - Theory and Methods: 1-13.

Examples



### Example 1: G7-GDP ###

# Logarithm of test data
# -> the logarithm of the data is assumed to follow the additive model
test_data <- gdpG7
y <- log(test_data$gdp)
n <- length(y)

# Applied tsmooth function for the trend
result <- tsmoothlm(y, p = 1, pmax = 1, qmax = 1, InfR = "Opt")
trend1 <- result$ye

# Plot of the results
t <- seq(from = 1962, to = 2020, length.out = n)
plot(t, y, type = "l", xlab = "Year", ylab = "log(G7-GDP)", bty = "n",
 lwd = 1, lty = 3,
 main = "Estimated trend for log-quarterly G7-GDP, Q1 1962 - Q4 2019")
points(t, trend1, type = "l", col = "red", lwd = 1)
title(sub = expression(italic("Figure 1")), col.sub = "gray47",
 cex.sub = 0.6, adj = 0)
result

[Package esemifar version 2.0.1 Index]