MultiChannel.WUC {Ckmeans.1d.dp} | R Documentation |
Optimal Multi-channel Weighted Univariate Clustering
Description
Perform optimal multi-channel weighted univariate k
-means clustering in linear time.
Usage
MultiChannel.WUC(x, y, k=c(1,9))
Arguments
x |
a numeric vector of data to be clustered. All |
y |
a numeric matrix of non-negative weights for each element in |
k |
either an exact integer number of clusters, or a vector of length two specifying the minimum and maximum numbers of clusters to be examined. The default is |
Details
MultiChannel.WUC
minimizes the total weighted within-cluster sum of squared distance (Zhong 2019). It uses the SMAWK algorithm (Aggarwal et al. 1987) with modified data structure to speed up the dynamic programming to linear runtime. The method selects an optimal k
based on an approximate Gaussian mixture model using the BIC.
Value
A list object containing the following components:
cluster |
a vector of clusters assigned to each element in |
centers |
a numeric vector of the (weighted) means for each cluster. |
withinss |
a numeric vector of the (weighted) within-cluster sum of squares for each cluster. |
size |
a vector of the (weighted) number of elements in each cluster. |
totss |
total sum of (weighted) squared distances between each element and the sample mean. This statistic is not dependent on the clustering result. |
tot.withinss |
total sum of (weighted) within-cluster squared distances between each element and its cluster mean. This statistic is minimized given the number of clusters. |
betweenss |
sum of (weighted) squared distances between each cluster mean and sample mean. This statistic is maximized given the number of clusters. |
xname |
a character string. The actual name of the |
yname |
a character string. The actual name of the |
Author(s)
Hua Zhong and Mingzhou Song
References
Aggarwal A, Klawe MM, Moran S, Shor P, Wilber R (1987).
“Geometric applications of a matrix-searching algorithm.”
Algorithmica, 2(1-4), 195–208.
doi:10.1007/BF01840359.
Zhong H (2019).
Model-free Gene-to-zone Network Inference of Molecular Mechanisms in Biology.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
Examples
x <- sample(x = c(1:100), size = 20, replace = TRUE)
Y <- matrix(sample(x = c(1:100), size = 40, replace = TRUE), ncol=2, nrow=length(x))
res <- MultiChannel.WUC(x = x, y = Y, k = c(1:10))
plot(res)
n <- c(20, 20, 20)
x <- c(rnorm(n[1], mean=-6),
rnorm(n[2], mean=0),
rnorm(n[3], mean=6))
Y <- matrix(c(
rep(c(1,0,0), times=n[1]),
rep(c(0,1,0), times=n[2]),
rep(c(0,0,1), times=n[3])
), byrow=TRUE, nrow=length(x))
res <- MultiChannel.WUC(x = x, y = Y, k = 3)
opar <- par(mar=c(3,3,2.5,1), mgp=c(1.5,0.5,0))
plot(res)
par(opar)