convex_clustering {CCMMR}  R Documentation 
Find a target number of clusters in the data using convex clustering
Description
convex_clustering
attempts to find the number of clusters
specified by the user by means of convex clustering. The algorithm looks for
each number of clusters between target_low
and target_high
. If
target_low
= target_high
, the algorithm searches for a single
clustering. It is recommended to specify a range around the desired number of
clusters, as not each number of clusters between 1 and nrow(X)
may be
attainable due to numerical inaccuracies.
Usage
convex_clustering(
X,
W,
target_low,
target_high = NULL,
max_iter_phase_1 = 2000,
max_iter_phase_2 = 20,
lambda_init = 0.01,
factor = 0.025,
tau = 0.001,
center = TRUE,
scale = TRUE,
eps_conv = 1e06,
burnin_iter = 25,
max_iter_conv = 5000,
save_clusterpath = FALSE,
verbose = 0
)
Arguments
X 
An 
W 
A 
target_low 
Lower bound on the number of clusters that should be
searched for. If 
target_high 
Upper bound on the number of clusters that should be
searched for. Default is 
max_iter_phase_1 
Maximum number of iterations to find an upper and lower bound for the value for lambda for which the desired number of clusters is attained. Default is 2000. 
max_iter_phase_2 
Maximum number of iterations to to refine the upper and lower bounds for lambda. Default is 20. 
lambda_init 
The first value for lambda other than 0 to use for convex clustering. Default is 0.01. 
factor 
The percentage by which to increase lambda in each step. Default is 0.025. 
tau 
Parameter to compute the threshold to fuse clusters. Default is 0.001. 
center 
If 
scale 
If 
eps_conv 
Parameter for determining convergence of the minimization. Default is 1e6. 
burnin_iter 
Number of updates of the loss function that are done without step doubling. Default is 25. 
max_iter_conv 
Maximum number of iterations for minimizing the loss function. Default is 5000. 
save_clusterpath 
If 
verbose 
Verbosity of the information printed during clustering. Default is 0, no output. 
Value
A cvxclust
object containing the following
info 
A dataframe containing for each value for lambda: the number of different clusters, and the value of the loss function at the minimum. 
merge 
The merge table containing the order at which the
observations in 
height 
The value for lambda at which each reduction in the number of clusters occurs. 
order 
The order of the observations in 
elapsed_time 
The number of seconds that elapsed while
running the code. Note that this does not include the time required for
input checking and possibly scaling and centering 
coordinates 
The clusterpath coordinates. Only part of the
output in case that 
lambdas 
The values for lambda for which a clustering was found. 
eps_fusions 
The threshold for cluster fusions that was used by the algorithm. 
phase_1_instances 
The number of instances of the loss function
that were minimized while finding an upper and lower bound for lambda. The
sum 
phase_2_instances 
The number of instances of the loss function
that were minimized while refining the value for lambda. The sum

num_clusters 
The different numbers of clusters that have been found. 
n 
The number of observations in 
See Also
convex_clusterpath, sparse_weights
Examples
# Load data
data(two_half_moons)
data = as.matrix(two_half_moons)
X = data[, 3]
y = data[, 3]
# Get sparse weights in dictionary of keys format with k = 5 and phi = 8
W = sparse_weights(X, 5, 8.0)
# Perform convex clustering with a target number of clusters
res1 = convex_clustering(X, W, target_low = 2, target_high = 5)
# Plot the clustering for 2 to 5 clusters
oldpar = par(mfrow=c(2, 2))
plot(X, col = clusters(res1, 2), main = "2 clusters", pch = 19)
plot(X, col = clusters(res1, 3), main = "3 clusters", pch = 19)
plot(X, col = clusters(res1, 4), main = "4 clusters", pch = 19)
plot(X, col = clusters(res1, 5), main = "5 clusters", pch = 19)
# A more generalized approach to plotting the results of a range of clusters
res2 = convex_clustering(X, W, target_low = 2, target_high = 7)
# Plot the clusterings
k = length(res2$num_clusters)
par(mfrow=c(ceiling(k / ceiling(sqrt(k))), ceiling(sqrt(k))))
for (i in 1:k) {
labels = clusters(res2, res2$num_clusters[i])
c = length(unique(labels))
plot(X, col = labels, main = paste(c, "clusters"), pch = 19)
}
par(oldpar)