R: Non hierarchical clustering: CLARA

nhclu_clara {bioregion}

R Documentation

Non hierarchical clustering: CLARA

Description

This function performs non hierarchical clustering on the basis of dissimilarity with partitioning around medoids, using the Clustering Large Applications (CLARA) algorithm.

Usage

nhclu_clara(
  dissimilarity,
  index = names(dissimilarity)[3],
  seed = NULL,
  n_clust = c(1, 2, 3),
  maxiter = 0,
  initializer = "LAB",
  fasttol = 1,
  numsamples = 5,
  sampling = 0.25,
  independent = FALSE,
  algorithm_in_output = TRUE
)

Arguments

`dissimilarity`	the output object from `dissimilarity()` or `similarity_to_dissimilarity()`, or a `dist` object. If a `data.frame` is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.
`index`	name or number of the dissimilarity column to use. By default, the third column name of `dissimilarity` is used.
`seed`	for the random number generator (NULL for random by default).
`n_clust`	an `integer` or an `integer` vector specifying the requested number(s) of clusters.
`maxiter`	an `integer` defining the maximum number of iterations.
`initializer`	a `character`, either 'BUILD' (used in classic PAM algorithm) or 'LAB' (linear approximative BUILD).
`fasttol`	positive `numeric` defining the tolerance for fast swapping behavior, set to 1 by default.
`numsamples`	positive `integer` defining the number of samples to draw.
`sampling`	positive `numeric` defining the sampling rate.
`independent`	a `boolean` indicating that the previous medoids are not kept in the next sample (FALSE by default).
`algorithm_in_output`	a `boolean` indicating if the original output of fastclara should be returned in the output (`TRUE` by default, see Value).

Details

Based on fastkmedoids package (fastclara).

Value

A list of class bioregion.clusters with five slots:

name: character containing the name of the algorithm
args: list of input arguments as provided by the user
inputs: list of characteristics of the clustering process
algorithm: list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE)
clusters: data.frame containing the clustering results

In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of fastclara.

Author(s)

Pierre Denelle (pierre.denelle@gmail.com), Boris Leroy (leroy.boris@gmail.com), and Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Schubert E, Rousseeuw PJ (2019). “Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms.” Similarity Search and Applications, 11807, 171–187.

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)

dissim <- dissimilarity(comat, metric = "all")

clust1 <- nhclu_clara(dissim, index = "Simpson", n_clust = 5)

partition_metrics(clust1, dissimilarity = dissim,
eval_metric = "pc_distance")

[Package bioregion version 1.1.1 Index]