FramedClust {OptCirClust}R Documentation

Framed Data Clustering

Description

Find a frame of given size, among all possible such frames on the input data, to minimize the minimum within-cluster sum of square distances.

Usage

FramedClust(
  X,
  K,
  frame.size,
  first.frame = 1,
  last.frame = length(X) - frame.size + 1,
  method = c("linear.polylog", "kmeans", "Ckmeans.1d.dp")
)

Arguments

X

a vector of data points to perform framed clustering

K

the number of clusters in each frame

frame.size

the number of points from X to be included in each frame. It is not the width of the frame.

first.frame

starting index of the first frame to be clustered. The first point in the first frame is X[first.frame].

last.frame

starting index of the last frame to be clustered. The first point in the first frame is X[last.frame].

method

the framed clustering method. See Details.

Details

The method option "linear.polylog" (default) performs fast optimal framed clustering. The runtime is O(K N \log^2 N) (Debnath and Song 2021).

The "kmeans" option repeatedly calling the heuristic k-means algorithm in all frames without any guarantee of cluster optimality.

The method option "Ckmeans.1d.dp" performs optimal framed clustering by repeatedly finding the best clustering within each frame using the "Ckmeans.1d.dp" method. At a runtime of O(K N^2), the algorithm is slow but optimal. It is included to provide a baseline.

Value

An object of class "FramedClust" which has a plot method. It is a list with the following components:

cluster

a vector of clusters assigned to each element in x. Each cluster is indexed by an integer from 1 to K. NA represents points from X that are outside the optimal frame, thus not part of any cluster.

centers

a numeric vector of the means for each cluster in the frame.

withinss

a numeric vector of the within-cluster sum of squared distances for each cluster.

size

a vector of the number of elements in each cluster.

totss

total sum of squared distances between each element and the sample mean. This statistic is not dependent on the clustering result.

tot.withinss

total sum of within-cluster squared distances between each element and its cluster mean. This statistic is minimized given the number of clusters.

betweenss

sum of squared distances between each cluster mean and sample mean. This statistic is maximized given the number of clusters.

X_name

a character string. The actual name of the X argument.

References

Debnath T, Song M (2021). “Fast optimal circular clustering and applications on round genomes.” IEEE/ACM Transactions on Computational Biology and Bioinformatics. doi: 10.1109/TCBB.2021.3077573.

Examples

N <- 100
X <- rnorm(N)
K <- 5
frame.size <- 60

result <- FramedClust(X, K, frame.size)
plot(result, main="Example 1. Framed clustering on all frames")

frame.size <- 40
first.frame <- 30
last.frame <- 50
method <- "linear.polylog"

result <- FramedClust(X, K, frame.size, first.frame,
                      last.frame, method)
plot(result, main="Example 2. Framed clustering on a subset of frames")


[Package OptCirClust version 0.0.4 Index]