FramedClust {OptCirClust} | R Documentation |
Framed Data Clustering
Description
Find a frame of given size, among all possible such frames on the input data, to minimize the minimum within-cluster sum of square distances.
Usage
FramedClust(
X,
K,
frame.size,
first.frame = 1,
last.frame = length(X) - frame.size + 1,
method = c("linear.polylog", "kmeans", "Ckmeans.1d.dp")
)
Arguments
X |
a vector of data points to perform framed clustering |
K |
the number of clusters in each frame |
frame.size |
the number of points from X to be included in each frame. It is not the width of the frame. |
first.frame |
starting index of the first frame to be clustered.
The first point in the first frame is |
last.frame |
starting index of the last frame to be clustered.
The first point in the first frame is |
method |
the framed clustering method. See Details. |
Details
The method option "linear.polylog"
(default) performs
fast optimal framed clustering. The runtime is
O(K N \log^2 N)
(Debnath and Song 2021).
The "kmeans"
option repeatedly calling the heuristic
k-means algorithm in all frames without any guarantee of
cluster optimality.
The method option "Ckmeans.1d.dp"
performs optimal framed
clustering by repeatedly finding the best clustering within
each frame using the "Ckmeans.1d.dp"
method. At a runtime
of O(K N^2)
, the algorithm is slow but optimal.
It is included to provide a baseline.
Value
An object of class "FramedClust"
which has a plot
method. It is a list with the following components:
cluster |
a vector of clusters assigned to each element in x. Each cluster is indexed by an integer from 1 to K. NA represents points from X that are outside the optimal frame, thus not part of any cluster. |
centers |
a numeric vector of the means for each cluster in the frame. |
withinss |
a numeric vector of the within-cluster sum of squared distances for each cluster. |
size |
a vector of the number of elements in each cluster. |
totss |
total sum of squared distances between each element and the sample mean. This statistic is not dependent on the clustering result. |
tot.withinss |
total sum of within-cluster squared distances between each element and its cluster mean. This statistic is minimized given the number of clusters. |
betweenss |
sum of squared distances between each cluster mean and sample mean. This statistic is maximized given the number of clusters. |
X_name |
a character string. The actual name of
the |
References
Debnath T, Song M (2021). “Fast optimal circular clustering and applications on round genomes.” IEEE/ACM Transactions on Computational Biology and Bioinformatics. doi: 10.1109/TCBB.2021.3077573.
Examples
N <- 100
X <- rnorm(N)
K <- 5
frame.size <- 60
result <- FramedClust(X, K, frame.size)
plot(result, main="Example 1. Framed clustering on all frames")
frame.size <- 40
first.frame <- 30
last.frame <- 50
method <- "linear.polylog"
result <- FramedClust(X, K, frame.size, first.frame,
last.frame, method)
plot(result, main="Example 2. Framed clustering on a subset of frames")