diss.PRED {TSclust} | R Documentation |
Dissimilarity Measure Based on Nonparametric Forecast
Description
Computes the dissimilarity between two time series as the L1 distance between the kernel estimators of their forecast densities at a pre-specified horizon.
Usage
diss.PRED(x, y, h, B=500, logarithm.x=FALSE, logarithm.y=FALSE,
differences.x=0, differences.y=0, plot=FALSE, models = NULL)
Arguments
x |
Numeric vector containing the first of the two time series. |
y |
Numeric vector containing the second of the two time series. |
h |
The horizon of interest, i.e the number of steps-ahead where the prediction is evaluated. |
B |
The amount of bootstrap resamples. |
logarithm.x |
Boolean. Specifies whether to transform series x by taking logarithms or not. When using |
logarithm.y |
Boolean. Specifies whether to transform series y by taking logarithms or not. When using |
differences.x |
Specifies the amount of differences to apply to series x. When using |
differences.y |
Specifies the amount of differences to apply to series y. When using |
plot |
If |
models |
A list containing either |
Details
The dissimilarity between the time series x
and y
is given by
d(x,y) = \int{ | f_{x,h}(u) - f_{y,h}(u) | du}
where f_{x,h}
and f_{y,h}
are kernel density estimators of the forecast densities h-steps ahead of x
and y
, respectively. The horizon of interest h is pre-specified by the user.
If models
is specified, the given model for each series is used for obtaining
the forecast densities. Currently, each element of the models
list can be the string "ets"
, which will fit a ets model using the function ets
in the forecast
package. If the element of models
is the string "arima", an ARIMA model using auto.arima
from the forecast package will be used. Finally, the elements of models can be a fitted model on the series using a method from the forecast
package which can be simulated, see link[forecast]{simulate.ets}
.
The kernel density estimators are based on B bootstrap replicates obtained by using a resampling procedure that mimics the generating processes, which are assumed to follow an arbitrary autoregressive structure (parametric or non-parametric). The procedure is completely detailed in Vilar et al. (2010). This function has high computational cost due to the bootstrapping procedure.
The procedure uses a bootstrap method that requires stationary time series. In order to support a wider range of time series, the method allows some transformations on the series before proceeding with the bootstrap resampling. This transformations are inverted before calculating the densities. The transformations allowed are logarithm and differenciation.
The parameters logarithm.x
, logarithm.y
, differences.x
, differences.y
can be specified with this purpose.
If using diss
function with "PRED" method
, the argument logarithms
must be used instead of logarithm.x
and logarithm.y
. logarithms
is a boolean vector specifying if the logarithm transform should be taken for each one of the series
. The argument differences
, a numeric vector specifying the amount of differences to apply the series
, is used instead of differences.x
and differences.y
. The plot is also different, showing all the densities in the same plot.
Value
diss.PRED
returns a list with the following components.
L1dist |
The computed distance. |
dens.x |
A 2-column matrix with the density of predicion of series |
dens.y |
A 2-column matrix with the density of predicion of series |
When used from the diss
wrapper function, it returns a list with the following components.
dist |
A |
densities |
A list of 2-column matrices containing the densities of each series, in the same format as 'dens.x' or 'dens.y' of |
Author(s)
José Antonio Vilar, Pablo Montero Manso.
References
Alonso, A.M., Berrendero, J.R., Hernandez, A. and Justel, A. (2006) Time series clustering based on forecast densities. Comput. Statist. Data Anal., 51,762–776.
Vilar, J.A., Alonso, A. M. and Vilar, J.M. (2010) Non-linear time series clustering based on non-parametric forecast densities. Comput. Statist. Data Anal., 54 (11), 2850–2865.
Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i01/.
See Also
diss
, link[forecast]{auto.arima}
, link[forecast]{ets}
, link[forecast]{simulate.ets}
Examples
x <- (rnorm(100))
x <- x + abs(min(x)) + 1 #shift to produce values greater than 0, for a correct logarithm transform
y <- (rnorm(100))
z <- sin(seq(0, pi, length.out=100))
## Compute the distance and check for coherent results
diss.PRED(x, y, h=6, logarithm.x=FALSE, logarithm.y=FALSE, differences.x=1, differences.y=0)
#create a dist object for its use with clustering functions like pam or hclust
diss( rbind(x,y,z), METHOD="PRED", h=3, B=200,
logarithms=c(TRUE,FALSE, FALSE), differences=c(1,1,2) )
#test the forecast package predictions
diss.PRED(x,y, h=5, models = list("ets", "arima"))