diss.MINDIST.SAX {TSclust} | R Documentation |
Symbolic Aggregate Aproximation related functions
Description
diss.MINDIST.SAX
computes a dissimilarity that lower bounds the Euclidean on the discretized, dimensionality reduced series. Function PAA
produces the dimension reduction. Function convert.to.SAX.symbol
produces the discretization.
Usage
diss.MINDIST.SAX(x, y, w, alpha=4, plot=FALSE)
PAA(x, w)
convert.to.SAX.symbol(x, alpha)
MINDIST.SAX(x, y, alpha, n)
SAX.plot(series, w, alpha, col.ser=rainbow(ncol(as.matrix(series))))
Arguments
x |
Numeric vector containing the first of the two time series. |
y |
Numeric vector containing the second of the two time series. |
w |
The amount of equal sized frames that the series will be reduced to. |
alpha |
The size of the alphabet, the amount of symbols used to represents the values of the series. |
plot |
If |
n |
The original size of the series. |
series |
A |
col.ser |
Colors for the series. One per series. |
Details
SAX is a symbolic representation of continuous time series.
w
must be an integer but it does not need to divide the length of the series. If w
divides the length of the series, the diss.MINDIST.SAX
plot uses this to show the size of the frames.
PAA
performs the Piecewise Aggregate Approximation of the series, reducing it to w
elements, called frames. Each frame is composed by n/w
observations of the original series, averaged. Observations are weighted when w
does not divide n
.
convert.to.SAX.symbol
performs SAX discretization: Discretizes the series x
to an alphabet of size alpha
, x
should be z-normalized in this case. The N(0,1)
distribution is divided in alpha
equal probability parts, if an observation falls into the i
th part (starting from minus infinity), it is assigned the i
symbol.
MINDIST.SAX
calculates the MINDIST dissimilarity between symbolic representations.
diss.MINDIST.SAX
combines the previous procedures to compute a dissimilarity between series. The series are z-normalized at first. Then the dimensionality is reduced uusin PAA
to produce series of length w
. The series are discretized to an alphabet of size alpha
using convert.to.SAX.symbol
. Finally the dissimilarity value is produced using MINDIST.SAX
.
SAX.plot
produces a plot of the SAX representation of the given series
.
Value
The computed dissimilarity.
Author(s)
Pablo Montero Manso, José Antonio Vilar.
References
Lin, J., Keogh, E., Lonardi, S. & Chiu, B. (2003) A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.
Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 3(3), 263-286.
Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i01/.
See Also
Examples
set.seed(12349)
n = 100
x <- rnorm(n) #generate sample series, white noise and a wiener process
y <- cumsum(rnorm(n))
w <- 20 #amount of equal-sized frames to divide the series, parameters for PAA
alpha <- 4 #size of the alphabet, parameter for SAX
#normalize
x <- (x - mean(x)) /sd(x)
y <- (y - mean(y)) /sd(y)
paax <- PAA(x, w) #generate PAA reductions
paay <- PAA(y, w)
plot(x, type="l", main="PAA reduction of series x") #plot an example of PAA reduction
p <- rep(paax,each=length(x)/length(paax)) #just for plotting the PAA
lines(p, col="red")
#repeat the example with y
plot(y, type="l", main="PAA reduction of series y")
py <- rep(paay,each=length(y)/length(paay))
lines(py, col="blue")
#convert to SAX representation
SAXx <- convert.to.SAX.symbol( paax, alpha)
SAXy <- convert.to.SAX.symbol( paay, alpha)
#CALC THE SAX DISTANCE
MINDIST.SAX(SAXx, SAXy, alpha, n)
#this whole process can be computed using diss.MINDIST.SAX
diss.MINDIST.SAX(x, y, w, alpha, plot=TRUE)
z <- rnorm(n)^2
diss(rbind(x,y,z), "MINDIST.SAX", w, alpha)
SAX.plot( as.ts(cbind(x,y,z)), w=w, alpha=alpha)