R: Symbolic Aggregate Aproximation related functions

diss.MINDIST.SAX {TSclust}

R Documentation

Symbolic Aggregate Aproximation related functions

Description

diss.MINDIST.SAX computes a dissimilarity that lower bounds the Euclidean on the discretized, dimensionality reduced series. Function PAA produces the dimension reduction. Function convert.to.SAX.symbol produces the discretization.

Usage

diss.MINDIST.SAX(x, y, w, alpha=4, plot=FALSE)
PAA(x, w)
convert.to.SAX.symbol(x, alpha)
MINDIST.SAX(x, y, alpha, n)
SAX.plot(series, w, alpha, col.ser=rainbow(ncol(as.matrix(series))))

Arguments

`x`	Numeric vector containing the first of the two time series.
`y`	Numeric vector containing the second of the two time series.
`w`	The amount of equal sized frames that the series will be reduced to.
`alpha`	The size of the alphabet, the amount of symbols used to represents the values of the series.
`plot`	If `TRUE`, plot a graphic of the reduced series, with their corresponding symbols.
`n`	The original size of the series.
`series`	A `ts` or `mts` object with the series to plot.
`col.ser`	Colors for the series. One per series.

Details

SAX is a symbolic representation of continuous time series.

w must be an integer but it does not need to divide the length of the series. If w divides the length of the series, the diss.MINDIST.SAX plot uses this to show the size of the frames.

PAA performs the Piecewise Aggregate Approximation of the series, reducing it to w elements, called frames. Each frame is composed by n/w observations of the original series, averaged. Observations are weighted when w does not divide n.

convert.to.SAX.symbol performs SAX discretization: Discretizes the series x to an alphabet of size alpha, x should be z-normalized in this case. The N(0,1) distribution is divided in alpha equal probability parts, if an observation falls into the ith part (starting from minus infinity), it is assigned the i symbol.

MINDIST.SAX calculates the MINDIST dissimilarity between symbolic representations.

diss.MINDIST.SAX combines the previous procedures to compute a dissimilarity between series. The series are z-normalized at first. Then the dimensionality is reduced uusin PAA to produce series of length w. The series are discretized to an alphabet of size alpha using convert.to.SAX.symbol. Finally the dissimilarity value is produced using MINDIST.SAX.

SAX.plot produces a plot of the SAX representation of the given series.

Value

The computed dissimilarity.

Author(s)

Pablo Montero Manso, José Antonio Vilar.

References

Lin, J., Keogh, E., Lonardi, S. & Chiu, B. (2003) A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 3(3), 263-286.

Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i01/.

Examples

set.seed(12349)
n = 100
x <- rnorm(n)  #generate sample series, white noise and a wiener process
y <- cumsum(rnorm(n))
w <- 20 #amount of equal-sized frames to divide the series, parameters for PAA
alpha <- 4 #size of the alphabet, parameter for SAX

#normalize
x <- (x - mean(x)) /sd(x)
y <- (y - mean(y)) /sd(y)

paax <- PAA(x, w) #generate PAA reductions
paay <- PAA(y, w)

plot(x, type="l", main="PAA reduction of series x") #plot an example of PAA reduction
p <- rep(paax,each=length(x)/length(paax)) #just for plotting the PAA
lines(p, col="red")

#repeat the example with y
plot(y, type="l", main="PAA reduction of series y") 
py <- rep(paay,each=length(y)/length(paay))
lines(py, col="blue")

#convert to SAX representation
SAXx <- convert.to.SAX.symbol( paax, alpha)
SAXy <- convert.to.SAX.symbol( paay, alpha)

#CALC THE SAX DISTANCE
MINDIST.SAX(SAXx, SAXy, alpha, n)

#this whole process can be computed using diss.MINDIST.SAX
diss.MINDIST.SAX(x, y, w, alpha, plot=TRUE)

z <- rnorm(n)^2

diss(rbind(x,y,z), "MINDIST.SAX", w, alpha)

SAX.plot( as.ts(cbind(x,y,z)), w=w, alpha=alpha)

[Package TSclust version 1.3.1 Index]