convex_clusterpath {CCMMR}R Documentation

Minimize the convex clustering loss function

Description

Minimizes the convex clustering loss function for a given set of values for lambda.

Usage

convex_clusterpath(
  X,
  W,
  lambdas,
  tau = 0.001,
  center = TRUE,
  scale = TRUE,
  eps_conv = 1e-06,
  burnin_iter = 25,
  max_iter_conv = 5000,
  save_clusterpath = TRUE,
  target_losses = NULL,
  save_losses = FALSE,
  save_convergence_norms = FALSE
)

Arguments

X

An n x p numeric matrix. This function assumes that each row represents an object with p attributes.

W

A sparseweights object, see sparse_weights.

lambdas

A vector containing the values for the penalty parameter.

tau

Parameter to compute the threshold to fuse clusters. Default is 0.001.

center

If TRUE, center X so that each column has mean zero. Default is TRUE.

scale

If TRUE, scale the loss function to ensure that the cluster solution is invariant to the scale of X. Default is TRUE. Not recommended to set to FALSE unless comparing to algorithms that minimize the unscaled convex clustering loss function.

eps_conv

Parameter for determining convergence of the minimization. Default is 1e-6.

burnin_iter

Number of updates of the loss function that are done without step doubling. Default is 25.

max_iter_conv

Maximum number of iterations for minimizing the loss function. Default is 5000.

save_clusterpath

If TRUE, store the solution that minimized the loss function for each lambda. Is required for drawing the clusterpath. Default is FALSE. To store the clusterpath coordinates, n x p x no. lambdas have to be stored, this may require too much memory for large data sets.

target_losses

The values of the loss function that are used to determine convergence of the algorithm (tested as: loss - target <= eps_conv * target). If the input is not NULL, it should be a vector with the same length as lambdas. Great care should be exercised to make sure that the target losses correspond to attainable values for the minimization. The inputs (X, W, lambdas) should be the same, but also the same version of the loss function (centered, scaled) should be used. Default is NULL.

save_losses

If TRUE, return the values of the loss function attained during minimization for each value of lambda. Default is FALSE.

save_convergence_norms

If TRUE, return the norm of the difference between consecutive iterates during minimization for each value of lambda. Default is FALSE. If timing the algorithm is of importance, do not set this to TRUE, as additional computations are done for bookkeeping that are irrelevant to the optimization.

Value

A cvxclust object containing the following

info

A dataframe containing for each value for lambda: the number of different clusters, and the value of the loss function at the minimum.

merge

The merge table containing the order at which the observations in X are clustered.

height

The value for lambda at which each reduction in the number of clusters occurs.

order

The order of the observations in X in order to draw a dendrogram without conflicting branches.

elapsed_time

The number of seconds that elapsed while running the code. Note that this does not include the time required for input checking and possibly scaling and centering X.

coordinates

The clusterpath coordinates. Only part of the output in case that save_clusterpath=TRUE.

lambdas

The values for lambda for which a clustering was found.

eps_fusions

The threshold for cluster fusions that was used by the algorithm.

num_clusters

The different numbers of clusters that have been found.

n

The number of observations in X.

losses

Optional: if save_losses = TRUE, the values of the loss function during minimization.

convergence_norms

Optional: if save_convergence_norms = TRUE, the norms of the differences between consecutive iterates during minimization.

See Also

convex_clustering, sparse_weights

Examples

# Load data
data(two_half_moons)
data = as.matrix(two_half_moons)
X = data[, -3]
y = data[, 3]

# Get sparse weights in dictionary of keys format with k = 5 and phi = 8
W = sparse_weights(X, 5, 8.0)

# Set a sequence for lambda
lambdas = seq(0, 2400, 1)

# Compute clusterpath
res = convex_clusterpath(X, W, lambdas)

# Get cluster labels for two clusters
labels = clusters(res, 2)

# Plot the clusterpath with colors based on the cluster labels
plot(res, col = labels)


[Package CCMMR version 0.2 Index]