CWindowCluster {funtimes}R Documentation

Window-Level Time Series Clustering

Description

Cluster time series at a window level, based on Algorithm 2 of Ciampi et al. (2010).

Usage

CWindowCluster(
  X,
  Alpha = NULL,
  Beta = NULL,
  Delta = NULL,
  Theta = 0.8,
  p,
  w,
  s,
  Epsilon = 1
)

Arguments

X

a matrix of time series observed within a slide (time series in columns).

Alpha

lower limit of the time-series domain, passed to CSlideCluster.

Beta

upper limit of the time-series domain passed to CSlideCluster.

Delta

closeness parameter passed to CSlideCluster.

Theta

connectivity parameter passed to CSlideCluster.

p

number of layers (time-series observations) in each slide.

w

number of slides in each window.

s

step to shift a window, calculated in the number of slides. The recommended values are 1 (overlapping windows) or equal to w (non-overlapping windows).

Epsilon

a real value in [0,1] used to identify each pair of time series that are clustered together over at least w*Epsilon slides within a window; see Definition 7 by Ciampi et al. (2010). Default is 1.

Details

This is the upper-level function for time series clustering. It exploits the function CSlideCluster to cluster time series within each slide based on closeness and homogeneity measures. Then, it uses slide-level cluster assignments to cluster time series within each window.

The total length of time series (number of levels, i.e., nrow(X)) should be divisible by p.

Value

A vector (if X contains only one window) or matrix with cluster labels for each time series (columns) and window (rows).

Author(s)

Vyacheslav Lyubchich

References

Ciampi A, Appice A, Malerba D (2010). “Discovering trend-based clusters in spatially distributed data streams.” In International Workshop of Mining Ubiquitous and Social Environments, 107–122.

See Also

CSlideCluster, CWindowCluster, and BICC

Examples

#For example, weekly data come in slides of 4 weeks
p <- 4 #number of layers in each slide (data come in a slide)
    
#We want to analyze the trend clusters within a window of 1 year
w <- 13 #number of slides in each window
s <- w  #step to shift a window

#Simulate 26 autoregressive time series with two years of weekly data (52*2 weeks), 
#with a 'burn-in' period of 300.
N <- 26
T <- 2*p*w
    
set.seed(123) 
phi <- c(0.5) #parameter of autoregression
X <- sapply(1:N, function(x) arima.sim(n = T + 300, 
     list(order = c(length(phi), 0, 0), ar = phi)))[301:(T + 300),]
colnames(X) <- paste("TS", c(1:dim(X)[2]), sep = "")
 
tmp <- CWindowCluster(X, Delta = NULL, Theta = 0.8, p = p, w = w, s = s, Epsilon = 1)

#Time series were simulated with the same parameters, but based on the clustering parameters,
#not all time series join the same cluster. We can plot the main cluster for each window, and 
#time series out of the cluster:
par(mfrow = c(2, 2))
ts.plot(X[c(1:(p*w)), tmp[1,] == 1], ylim = c(-4, 4), 
        main = "Time series cluster 1 in window 1")
ts.plot(X[c(1:(p*w)), tmp[1,] != 1], ylim = c(-4, 4), 
        main = "The rest of the time series in window 1")
ts.plot(X[c(1:(p*w)) + s*p, tmp[2,] == 1], ylim = c(-4, 4), 
        main = "Time series cluster 1 in window 2")
ts.plot(X[c(1:(p*w)) + s*p, tmp[2,] != 1], ylim = c(-4, 4), 
        main = "The rest of the time series in window 2")


[Package funtimes version 9.1 Index]