clustering.sc.dp {clustering.sc.dp} | R Documentation |
Optimal Clustering Multidimensional Data with Sequential Constraint by Dynamic Programming
Description
Perform optimal clustering on multidimensional data with sequential constraint (i.e. only subsequent elements of the input may form a cluster).
Usage
clustering.sc.dp(x, k)
Arguments
x |
a multi-dimensional array containing input data to be clustered |
k |
the number of clusters |
Details
The 'clustering.sc.dp
' algorithm (Szkaliczki, 2016) groups multidimensional data given by x
into k
clusters with sequential constraint by dynamic programming. It generalises the one-dimensional 'Ckmeans.1d.dp
' algorithm (Wang and Song, 2011) to multidimensional data. If only subsequent elements of the input data may form a cluster the algorithm guarantees the optimality of clustering – the sum of squares of within-cluster distances (withinss) from each element to its corresponding cluster centre (mean) is always the minimum. The sequential constraint is typically required in clustering datastreams or items with time stamps such as video frames, GPS signals of vehicles or movement data of persons etc. The run time of the algorithm is O( k d n^2) where k, d and n gives the number of clusters, the dimensions of the elements and the number of elements, respectively.
Value
An object of class 'clustering.sc.dp
' which has a print method and is a list with components:
cluster |
a vector of cluster indices assigned to each element in |
centers |
a matrix whose rows represent cluster centres |
withinss |
the within-cluster sum of squares for each cluster |
size |
a vector of the number of points in each cluster |
Author(s)
Tibor Szkaliczki szkaliczki.tibor@sztaki.hu
References
Szkaliczki, T. (2016) "clustering.sc.dp: Optimal Clustering with Sequential Constraint by Using Dynamic Programming" <doi: 10.32614/RJ-2016-022> Wang, H. and Song, M. (2011) "Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming" <doi: 10.32614/RJ-2011-015>
Examples
# Example: clustering data generated from a random walk
x<-matrix(, nrow = 100, ncol = 2)
x[1,]<-c(0,0)
for(i in 2:100) {
x[i,1]<-x[i-1,1] + rnorm(1,0,0.1)
x[i,2]<-x[i-1,2] + rnorm(1,0,0.1)
}
k<-2
result<-clustering.sc.dp(x,k)
plot(x, type = 'b', col = result$cluster)
points(result$centers, pch = 24, bg = (1:k))