nhclu_kmeans {bioregion}R Documentation

Non hierarchical clustering: k-means analysis

Description

This function performs non hierarchical clustering on the basis of dissimilarity with a k-means analysis.

Usage

nhclu_kmeans(
  dissimilarity,
  index = names(dissimilarity)[3],
  n_clust = NULL,
  iter_max = 10,
  nstart = 10,
  algorithm = "Hartigan-Wong"
)

Arguments

dissimilarity

the output object from dissimilarity() or similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.

index

name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

n_clust

an integer or a vector of integers specifying the requested number(s) of clusters

iter_max

an integer specifying the maximum number of iterations for the kmeans method (see stats::kmeans())

nstart

an integer specifying how many random sets of n_clust should be selected as starting points for the kmeans analysis (see stats::kmeans())

algorithm

a ⁠character string⁠ specifying the algorithm to use for kmean (see stats::kmeans()). Available options are Hartigan-Wong, Lloyd, Forgy and MacQueen.

Details

This method partitions the data into k groups such that that the sum of squares of euclidean distances from points to the assigned cluster centers is minimized. k-means cannot be applied directly on dissimilarity/beta-diversity metrics, because these distances are not euclidean. Therefore, it requires first to transform the dissimilarity matrix with a Principal Coordinate Analysis (using the function ape::pcoa()), and then applying k-means on the coordinates of points in the PCoA. Because this makes an additional transformation of the initial matrix of dissimilarity, the partitioning around medoids method should be preferred (nhclu_pam())

Value

A list of class bioregion.clusters with five slots:

  1. name: ⁠character string⁠ containing the name of the algorithm

  2. args: list of input arguments as provided by the user

  3. inputs: list of characteristics of the clustering process

  4. algorithm: list of all objects associated with the clustering procedure, such as original cluster objects

  5. clusters: data.frame containing the clustering results

Author(s)

Boris Leroy (leroy.boris@gmail.com), Pierre Denelle (pierre.denelle@gmail.com) and Maxime Lenormand (maxime.lenormand@inrae.fr)

See Also

nhclu_pam

cut_tree

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)

comnet <- mat_to_net(comat)

dissim <- dissimilarity(comat, metric = "all")

clust1 <- nhclu_kmeans(dissim, n_clust = 2:10, index = "Simpson")
clust2 <- nhclu_kmeans(dissim, n_clust = 2:15, index = "Simpson")
partition_metrics(clust2, dissimilarity = dissim,
                  eval_metric = "pc_distance")

partition_metrics(clust2, net = comnet, species_col = "Node2",
                  site_col = "Node1", eval_metric = "avg_endemism")


[Package bioregion version 1.1.0 Index]