BICC {funtimes} | R Documentation |
BIC-Based Spatio-Temporal Clustering
Description
Apply the algorithm of unsupervised spatio-temporal clustering, TRUST
(Ciampi et al. 2010), with automatic selection of its
tuning parameters Delta
and Epsilon
based on Bayesian
information criterion, BIC (Schaeffer et al. 2016).
Usage
BICC(X, Alpha = NULL, Beta = NULL, Theta = 0.8, p, w, s)
Arguments
X |
a matrix of time series observed within a slide (time series in columns). |
Alpha |
lower limit of the time-series domain,
passed to |
Beta |
upper limit of the time-series domain passed to |
Theta |
connectivity parameter passed to |
p |
number of layers (time-series observations) in each slide. |
w |
number of slides in each window. |
s |
step to shift a window, calculated in the number of slides. The recommended
values are 1 (overlapping windows) or equal to |
Details
This is the upper-level function for time series clustering.
It exploits the functions CWindowCluster
and
CSlideCluster
to cluster time series based on closeness and
homogeneity measures. Clustering is performed multiple times with a range
of equidistant values for the parameters Delta
and Epsilon
,
then optimal parameters Delta
and Epsilon
along with the
corresponding clustering results are shown
(see Schaeffer et al. 2016, for more details).
The total length of time series (number of levels, i.e., nrow(X)
)
should be divisible by p
.
Value
A list with the following elements:
delta.opt |
optimal value for the clustering parameter |
epsilon.opt |
optimal value for the clustering parameter |
clusters |
vector of length |
IC |
values of the information criterion (BIC) for each considered
combination of |
delta.all |
vector of considered values for |
epsilon.all |
vector of considered values for |
Author(s)
Ethan Schaeffer, Vyacheslav Lyubchich
References
Ciampi A, Appice A, Malerba D (2010).
“Discovering trend-based clusters in spatially distributed data streams.”
In International Workshop of Mining Ubiquitous and Social Environments, 107–122.
Schaeffer ED, Testa JM, Gel YR, Lyubchich V (2016).
“On information criteria for dynamic spatio-temporal clustering.”
In Banerjee A, Ding W, Dy JG, Lyubchich V, Rhines A (eds.), The 6th International Workshop on Climate Informatics: CI2016, 5–8.
doi:10.5065/D6K072N6.
See Also
CSlideCluster
, CWindowCluster
, purity
Examples
# Fix seed for reproducible simulations:
set.seed(1)
##### Example 1
# Similar to Schaeffer et al. (2016), simulate 3 years of monthly data
#for 10 locations and apply clustering:
# 1.1 Simulation
T <- 36 #total months
N <- 10 #locations
phi <- c(0.5) #parameter of autoregression
burn <- 300 #burn-in period for simulations
X <- sapply(1:N, function(x)
arima.sim(n = T + burn,
list(order = c(length(phi), 0, 0), ar = phi)))[(burn + 1):(T + burn),]
colnames(X) <- paste("TS", c(1:dim(X)[2]), sep = "")
# 1.2 Clustering
# Assume that information arrives in year-long slides or data chunks
p <- 12 #number of time layers (months) in a slide
# Let the upper level of clustering (window) be the whole period of 3 years, so
w <- 3 #number of slides in a window
s <- w #step to shift a window, but it does not matter much here as we have only one window of data
tmp <- BICC(X, p = p, w = w, s = s)
# 1.3 Evaluate clustering
# In these simulations, it is known that all time series belong to one class,
#since they were all simulated the same way:
classes <- rep(1, 10)
# Use the information on the classes to calculate clustering purity:
purity(classes, tmp$clusters[1,])
##### Example 2
# 2.1 Modify time series and update classes accordingly:
# Add a mean shift to a half of the time series:
X2 <- X
X2[, 1:(N/2)] <- X2[, 1:(N/2)] + 3
classes2 <- rep(c(1, 2), each = N/2)
# 2.2 Re-apply clustering procedure and evaluate clustering purity:
tmp2 <- BICC(X2, p = p, w = w, s = s)
tmp2$clusters
purity(classes2, tmp2$clusters[1,])