R: Minimize the convex clustering loss function

convex_clusterpath {CCMMR}

R Documentation

Minimize the convex clustering loss function

Description

Minimizes the convex clustering loss function for a given set of values for lambda.

Usage

convex_clusterpath(
  X,
  W,
  lambdas,
  tau = 0.001,
  center = TRUE,
  scale = TRUE,
  eps_conv = 1e-06,
  burnin_iter = 25,
  max_iter_conv = 5000,
  save_clusterpath = TRUE,
  target_losses = NULL,
  save_losses = FALSE,
  save_convergence_norms = FALSE
)

Arguments

`X`	An `n` x `p` numeric matrix. This function assumes that each row represents an object with `p` attributes.
`W`	A `sparseweights` object, see sparse_weights.
`lambdas`	A vector containing the values for the penalty parameter.
`tau`	Parameter to compute the threshold to fuse clusters. Default is 0.001.
`center`	If `TRUE`, center `X` so that each column has mean zero. Default is `TRUE`.
`scale`	If `TRUE`, scale the loss function to ensure that the cluster solution is invariant to the scale of `X`. Default is `TRUE`. Not recommended to set to `FALSE` unless comparing to algorithms that minimize the unscaled convex clustering loss function.
`eps_conv`	Parameter for determining convergence of the minimization. Default is 1e-6.
`burnin_iter`	Number of updates of the loss function that are done without step doubling. Default is 25.
`max_iter_conv`	Maximum number of iterations for minimizing the loss function. Default is 5000.
`save_clusterpath`	If `TRUE`, store the solution that minimized the loss function for each lambda. Is required for drawing the clusterpath. Default is `FALSE`. To store the clusterpath coordinates, `n` x `p` x `no. lambdas` have to be stored, this may require too much memory for large data sets.
`target_losses`	The values of the loss function that are used to determine convergence of the algorithm (tested as: loss - target <= `eps_conv` * target). If the input is not `NULL`, it should be a vector with the same length as `lambdas`. Great care should be exercised to make sure that the target losses correspond to attainable values for the minimization. The inputs (`X`, `W`, `lambdas`) should be the same, but also the same version of the loss function (centered, scaled) should be used. Default is `NULL`.
`save_losses`	If `TRUE`, return the values of the loss function attained during minimization for each value of lambda. Default is `FALSE`.
`save_convergence_norms`	If `TRUE`, return the norm of the difference between consecutive iterates during minimization for each value of lambda. Default is `FALSE`. If timing the algorithm is of importance, do not set this to `TRUE`, as additional computations are done for bookkeeping that are irrelevant to the optimization.

Value

A cvxclust object containing the following

`info`	A dataframe containing for each value for lambda: the number of different clusters, and the value of the loss function at the minimum.
`merge`	The merge table containing the order at which the observations in `X` are clustered.
`height`	The value for lambda at which each reduction in the number of clusters occurs.
`order`	The order of the observations in `X` in order to draw a dendrogram without conflicting branches.
`elapsed_time`	The number of seconds that elapsed while running the code. Note that this does not include the time required for input checking and possibly scaling and centering `X`.
`coordinates`	The clusterpath coordinates. Only part of the output in case that `save_clusterpath=TRUE`.
`lambdas`	The values for lambda for which a clustering was found.
`eps_fusions`	The threshold for cluster fusions that was used by the algorithm.
`num_clusters`	The different numbers of clusters that have been found.
`n`	The number of observations in `X`.
`losses`	Optional: if `save_losses = TRUE`, the values of the loss function during minimization.
`convergence_norms`	Optional: if `save_convergence_norms = TRUE`, the norms of the differences between consecutive iterates during minimization.

Examples

# Load data
data(two_half_moons)
data = as.matrix(two_half_moons)
X = data[, -3]
y = data[, 3]

# Get sparse weights in dictionary of keys format with k = 5 and phi = 8
W = sparse_weights(X, 5, 8.0)

# Set a sequence for lambda
lambdas = seq(0, 2400, 1)

# Compute clusterpath
res = convex_clusterpath(X, W, lambdas)

# Get cluster labels for two clusters
labels = clusters(res, 2)

# Plot the clusterpath with colors based on the cluster labels
plot(res, col = labels)

[Package CCMMR version 0.2 Index]

Minimize the convex clustering loss function

Description

Usage

Arguments

Value

See Also

Examples