tsmoothlm {esemifar} | R Documentation |
Advanced Data-driven Nonparametric Regression for the Trend in Equidistant Time Series
Description
This function runs an iterative plug-in algorithm to find the optimal bandwidth for the estimation of the nonparametric trend in equidistant time series (with long-memory errors) and then employs the resulting bandwidth via either local polynomial or kernel regression.
Usage
tsmoothlm(
y,
pmin = c(0, 1, 2, 3, 4, 5),
pmax = c(0, 1, 2, 3, 4, 5),
qmin = c(0, 1, 2, 3, 4, 5),
qmax = c(0, 1, 2, 3, 4, 5),
p = c(1, 3),
mu = c(0, 1, 2, 3),
InfR = c("Opt", "Nai", "Var"),
bStart = 0.15,
bb = c(0, 1),
cb = 0.05,
method = c("lpr", "kr")
)
Arguments
y |
a numeric vector that contains the time series ordered from past to present. | ||||||||||
pmin |
an integer value | ||||||||||
pmax |
an integer value | ||||||||||
qmin |
an integer value | ||||||||||
qmax |
an integer value | ||||||||||
p |
an integer | ||||||||||
mu |
an integer
| ||||||||||
InfR |
a character object that represents the inflation
rate in the form
| ||||||||||
bStart |
a numeric object that indicates the starting value of the
bandwidth for the iterative process; should be | ||||||||||
bb |
can be set to
| ||||||||||
cb |
a numeric value that indicates the percentage of omitted
observations on each side of the observation period for the automated
bandwidth selection; is set to | ||||||||||
method |
the final smoothing approach; |
Details
The trend is estimated based on the additive nonparametric regression model for an equidistant time series
y_t = m(x_t) + \epsilon_t,
where y_t
is the observed time series, x_t
is the rescaled time
on the interval [0, 1]
, m(x_t)
is a smooth and deterministic
trend function and \epsilon_t
are stationary errors with
E(\epsilon_t) = 0
and is assumed to follow a FARIMA(p, d, q
)
model (see also Beran and Feng, 2002a, Beran and Feng, 2002b and Beran
and Feng, 2002c).
The iterative-plug-in (IPI) algorithm, which numerically minimizes the Asymptotic Mean Squared Error (AMISE), is based on the proposal of Beran and Feng (2002a).
The function calculates suitable estimates for c_f
, the variance
factor, and I[m^{(k)}]
over different iterations. In each
iteration, a bandwidth is obtained in accordance with the AMISE that once
more serves as an input for the following iteration. The process repeats
until either convergence or the 40th iteration is reached. For further
details on the asymptotic theory or the algorithm, please see Letmathe et
al., 2023.
To apply the function, the following arguments are needed: a data input
y
, an order of polynomial p
, a kernel weighting function
defined by the smoothness parameter mu
, an inflation rate setting
InfR
(see also Beran and Feng, 2002b), a starting value for the
relative bandwidth bStart
, a
boundary method bb
, a boundary cut-off percentage cb
and a
final smoothing method method
. In fact, aside from the input vector
y
, every argument has a default setting that can be adjusted for the
individual case. Theoretically, the initial bandwidth does not affect the
selected optimal bandwidth. However, in practice local minima of the AMISE
might exist and influence the selected bandwidth. Therefore, the default
setting is bStart = 0.15
. In the rare
case of a clearly unsuitable optimal bandwidth, a starting bandwidth that
differs from the default value is a first possible approach to obtain a
better result. Other argument adjustments can be tried as well. For more
specific information on the input arguments consult the section
Arguments.
When applying the function, an optimal bandwidth is obtained based on a
strongly modified version of the IPI algorithm of Beran and Feng (2002a). In
a second step, the nonparametric trend of the series is calculated with
respect to the chosen bandwidth and the selected regression method (lpf
or kr
). Please note that method = "lpf"
is strongly recommended
by the authors. Moreover, it is notable that p
is automatically set to
1
for method = "kr"
. The output object is then a list that
contains, among other components, the original time series, the estimated
trend values and the series without the trend.
The default print method for this function delivers only key numbers such as
the iteration steps and the generated optimal bandwidth rounded to the fourth
decimal. The exact numbers and results such as the estimated nonparametric
trend series are saved within the output object and can be addressed via the
$
sign.
Value
The function returns a list with different components:
- FARIMA.BIC
the Bayesian Information Criterion of the optimal FARIMA(
p,d,q
) model.- cb
the percentage of omitted observations on each side of the observation period.
- b0
the optimal bandwidth chosen by the IPI-algorithm.
- bb
the boundary bandwidth method used within the IPI; always equal to 1.
- bStart
the starting value of the (relative) bandwidth; input argument.
- cf0
the estimated variance factor; in contrast to the definitions given in the Details section, this object actually contains an estimated value of
2\pi c_f
, i.e. it corresponds to the estimated sum of autocovariances.- d.BIC
the long-memory parameter of the optimal FARIMA(
p,d,q
) model.- FARMA.BIC
the model fit of the selected FARIMA(
p,d,q
model.- I2
the estimated value of
I[m^{(k)}]
.- InfR
the setting for the inflation rate according to the chosen algorithm.
- iterations
the bandwidths of the single iterations steps
- mu
the smoothness parameter of the second order kernel; input argument.
- n
the number of observations.
- niterations
the total number of iterations until convergence.
- orig
the original input series; input argument.
- p.BIC
the order p of the optimal FARIMA(
p,d,q
) model.- p
the order of polynomial used in the IPI-algorithm; also used for the final smoothing, if
method = "lpr"
; input argument.- q.BIC
the order
q
of the optimal FARIMA(p,d,q
) model.- res
the estimated residual series.
- v
the considered order of derivative of the trend; is always zero for this function.
- ws
the weighting system matrix used within the local polynomial regression; this matrix is a condensed version of a complete weighting system matrix; in each row of
ws
, the weights for conducting the smoothing procedure at a specific observation time point can be found; the first[nb + 0.5]
rows, wheren
corresponds to the number of observations,b
is the bandwidth considered for smoothing and[.]
denotes the integer part, contain the weights at the[nb + 0.5]
left-hand boundary points; the weights in row[nb + 0.5] + 1
are representative for the estimation at all interior points and the remaining rows contain the weights for the right-hand boundary points; each row has exactly2[nb + 0.5] + 1
elements, more specifically the weights for observations of the nearest2[nb + 0.5] + 1
time points; moreover, the weights are normalized, i.e. the weights are obtained under consideration of the time pointsx_t = t/n
, wheret = 1, 2, ..., n
.- ye
the nonparametric estimates of the trend.
Author(s)
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/Sebastian Letmathe (Scientific Employee) (Department of Economics, Paderborn University),
Package Creator and MaintainerDominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
References
Beran, J. and Y. Feng (2002a). Iterative plug-in algorithms for SEMIFAR models - definition, convergence, and asymptotic properties. Journal of Computational and Graphical Statistics 11(3), 690-713.
Beran, J. and Feng, Y. (2002b). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54(2), 291-311.
Beran, J. and Feng, Y. (2002c). SEMIFAR models - a semiparametric approach to modelling trends, longrange dependence and nonstationarity. Computational Statistics & Data Analysis 40(2), 393-419.
Letmathe, S., Beran, J. and Feng, Y. (2023). An extended exponential SEMIFAR model with application in R. Communications in Statistics - Theory and Methods: 1-13.
Examples
### Example 1: G7-GDP ###
# Logarithm of test data
# -> the logarithm of the data is assumed to follow the additive model
test_data <- gdpG7
y <- log(test_data$gdp)
n <- length(y)
# Applied tsmooth function for the trend
result <- tsmoothlm(y, p = 1, pmax = 1, qmax = 1, InfR = "Opt")
trend1 <- result$ye
# Plot of the results
t <- seq(from = 1962, to = 2020, length.out = n)
plot(t, y, type = "l", xlab = "Year", ylab = "log(G7-GDP)", bty = "n",
lwd = 1, lty = 3,
main = "Estimated trend for log-quarterly G7-GDP, Q1 1962 - Q4 2019")
points(t, trend1, type = "l", col = "red", lwd = 1)
title(sub = expression(italic("Figure 1")), col.sub = "gray47",
cex.sub = 0.6, adj = 0)
result