convex_clustering {CCMMR} | R Documentation |
Find a target number of clusters in the data using convex clustering
Description
convex_clustering
attempts to find the number of clusters
specified by the user by means of convex clustering. The algorithm looks for
each number of clusters between target_low
and target_high
. If
target_low
= target_high
, the algorithm searches for a single
clustering. It is recommended to specify a range around the desired number of
clusters, as not each number of clusters between 1 and nrow(X)
may be
attainable due to numerical inaccuracies.
Usage
convex_clustering(
X,
W,
target_low,
target_high = NULL,
max_iter_phase_1 = 2000,
max_iter_phase_2 = 20,
lambda_init = 0.01,
factor = 0.025,
tau = 0.001,
center = TRUE,
scale = TRUE,
eps_conv = 1e-06,
burnin_iter = 25,
max_iter_conv = 5000,
save_clusterpath = FALSE,
verbose = 0
)
Arguments
X |
An |
W |
A |
target_low |
Lower bound on the number of clusters that should be
searched for. If |
target_high |
Upper bound on the number of clusters that should be
searched for. Default is |
max_iter_phase_1 |
Maximum number of iterations to find an upper and lower bound for the value for lambda for which the desired number of clusters is attained. Default is 2000. |
max_iter_phase_2 |
Maximum number of iterations to to refine the upper and lower bounds for lambda. Default is 20. |
lambda_init |
The first value for lambda other than 0 to use for convex clustering. Default is 0.01. |
factor |
The percentage by which to increase lambda in each step. Default is 0.025. |
tau |
Parameter to compute the threshold to fuse clusters. Default is 0.001. |
center |
If |
scale |
If |
eps_conv |
Parameter for determining convergence of the minimization. Default is 1e-6. |
burnin_iter |
Number of updates of the loss function that are done without step doubling. Default is 25. |
max_iter_conv |
Maximum number of iterations for minimizing the loss function. Default is 5000. |
save_clusterpath |
If |
verbose |
Verbosity of the information printed during clustering. Default is 0, no output. |
Value
A cvxclust
object containing the following
info |
A dataframe containing for each value for lambda: the number of different clusters, and the value of the loss function at the minimum. |
merge |
The merge table containing the order at which the
observations in |
height |
The value for lambda at which each reduction in the number of clusters occurs. |
order |
The order of the observations in |
elapsed_time |
The number of seconds that elapsed while
running the code. Note that this does not include the time required for
input checking and possibly scaling and centering |
coordinates |
The clusterpath coordinates. Only part of the
output in case that |
lambdas |
The values for lambda for which a clustering was found. |
eps_fusions |
The threshold for cluster fusions that was used by the algorithm. |
phase_1_instances |
The number of instances of the loss function
that were minimized while finding an upper and lower bound for lambda. The
sum |
phase_2_instances |
The number of instances of the loss function
that were minimized while refining the value for lambda. The sum
|
num_clusters |
The different numbers of clusters that have been found. |
n |
The number of observations in |
See Also
convex_clusterpath, sparse_weights
Examples
# Load data
data(two_half_moons)
data = as.matrix(two_half_moons)
X = data[, -3]
y = data[, 3]
# Get sparse weights in dictionary of keys format with k = 5 and phi = 8
W = sparse_weights(X, 5, 8.0)
# Perform convex clustering with a target number of clusters
res1 = convex_clustering(X, W, target_low = 2, target_high = 5)
# Plot the clustering for 2 to 5 clusters
oldpar = par(mfrow=c(2, 2))
plot(X, col = clusters(res1, 2), main = "2 clusters", pch = 19)
plot(X, col = clusters(res1, 3), main = "3 clusters", pch = 19)
plot(X, col = clusters(res1, 4), main = "4 clusters", pch = 19)
plot(X, col = clusters(res1, 5), main = "5 clusters", pch = 19)
# A more generalized approach to plotting the results of a range of clusters
res2 = convex_clustering(X, W, target_low = 2, target_high = 7)
# Plot the clusterings
k = length(res2$num_clusters)
par(mfrow=c(ceiling(k / ceiling(sqrt(k))), ceiling(sqrt(k))))
for (i in 1:k) {
labels = clusters(res2, res2$num_clusters[i])
c = length(unique(labels))
plot(X, col = labels, main = paste(c, "clusters"), pch = 19)
}
par(oldpar)