R: Non hierarchical clustering: partitioning around medoids

nhclu_pam {bioregion}

R Documentation

Non hierarchical clustering: partitioning around medoids

Description

This function performs non hierarchical clustering on the basis of dissimilarity with partitioning around medoids.

Usage

nhclu_pam(
  dissimilarity,
  index = names(dissimilarity)[3],
  seed = NULL,
  n_clust = c(1, 2, 3),
  variant = "faster",
  nstart = 1,
  cluster_only = FALSE,
  algorithm_in_output = TRUE,
  ...
)

Arguments

`dissimilarity`	the output object from `dissimilarity()` or `similarity_to_dissimilarity()`, or a `dist` object. If a `data.frame` is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.
`index`	name or number of the dissimilarity column to use. By default, the third column name of `dissimilarity` is used.
`seed`	for the random number generator (NULL for random by default).
`n_clust`	an `integer` or an `integer` vector specifying the requested number(s) of clusters.
`variant`	a `character` string specifying the variant of pam to use, by default `faster`. Available options are `original`, `o_1`, `o_2`, `f_3`, `f_4`, `f_5` or `faster`. See pam for more details.
`nstart`	an `integer` specifying the number of random start for the pam algorithm. By default, 1 (for the `faster` variant).
`cluster_only`	a `boolean` specifying if only the clustering should be returned from the pam function (more efficient).
`algorithm_in_output`	a `boolean` indicating if the original output of pam should be returned in the output (`TRUE` by default, see Value).
`...`	you can add here further arguments to be passed to `pam()` (see pam)

Details

This method partitions data into the chosen number of cluster on the basis of the input dissimilarity matrix. It is more robust than k-means because it minimizes the sum of dissimilarity between cluster centres and points assigned to the cluster - whereas the k-means approach minimizes the sum of squared euclidean distances (thus k-means cannot be applied directly on the input dissimilarity matrix if the distances are not euclidean).

Value

A list of class bioregion.clusters with five slots:

name: character containing the name of the algorithm
args: list of input arguments as provided by the user
inputs: list of characteristics of the clustering process
algorithm: list of all objects associated with the clustering procedure, such as original cluster objects
clusters: data.frame containing the clustering results

In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of pam.

Author(s)

Boris Leroy (leroy.boris@gmail.com), Pierre Denelle (pierre.denelle@gmail.com) and Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Kaufman L, Rousseeuw PJ (2009). “Finding groups in data: An introduction to cluster analysis.” In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis..

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)

comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")

clust1 <- nhclu_pam(dissim, n_clust = 2:10, index = "Simpson")
clust2 <- nhclu_pam(dissim, n_clust = 2:15, index = "Simpson")
partition_metrics(clust2, dissimilarity = dissim,
eval_metric = "pc_distance")
partition_metrics(clust2, net = comnet, species_col = "Node2",
                   site_col = "Node1", eval_metric = "avg_endemism")

[Package bioregion version 1.1.1 Index]