R: Non hierarchical clustering: k-means analysis

nhclu_kmeans {bioregion}

R Documentation

Non hierarchical clustering: k-means analysis

Description

This function performs non hierarchical clustering on the basis of dissimilarity with a k-means analysis.

Usage

nhclu_kmeans(
  dissimilarity,
  index = names(dissimilarity)[3],
  seed = NULL,
  n_clust = c(1, 2, 3),
  iter_max = 10,
  nstart = 10,
  algorithm = "Hartigan-Wong",
  algorithm_in_output = TRUE
)

Arguments

`dissimilarity`	the output object from `dissimilarity()` or `similarity_to_dissimilarity()`, or a `dist` object. If a `data.frame` is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.
`index`	name or number of the dissimilarity column to use. By default, the third column name of `dissimilarity` is used.
`seed`	for the random number generator (NULL for random by default).
`n_clust`	an `integer` or an `integer` vector specifying the requested number(s) of clusters
`iter_max`	an `integer` specifying the maximum number of iterations for the kmeans method (see kmeans)
`nstart`	an `integer` specifying how many random sets of `n_clust` should be selected as starting points for the kmeans analysis (see kmeans)
`algorithm`	a `character` specifying the algorithm to use for kmean (see kmeans). Available options are Hartigan-Wong, Lloyd, Forgy and MacQueen.
`algorithm_in_output`	a `boolean` indicating if the original output of kmeans should be returned in the output (`TRUE` by default, see Value).

Details

This method partitions the data into k groups such that that the sum of squares of euclidean distances from points to the assigned cluster centers is minimized. k-means cannot be applied directly on dissimilarity/beta-diversity metrics, because these distances are not euclidean. Therefore, it requires first to transform the dissimilarity matrix with a Principal Coordinate Analysis (using the function pcoa), and then applying k-means on the coordinates of points in the PCoA. Because this makes an additional transformation of the initial matrix of dissimilarity, the partitioning around medoids method should be preferred (nhclu_pam)

Value

A list of class bioregion.clusters with five slots:

name: character containing the name of the algorithm
args: list of input arguments as provided by the user
inputs: list of characteristics of the clustering process
algorithm: list of all objects associated with the clustering procedure, such as original cluster objects
clusters: data.frame containing the clustering results

In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of kmeans.

Author(s)

Boris Leroy (leroy.boris@gmail.com), Pierre Denelle (pierre.denelle@gmail.com) and Maxime Lenormand (maxime.lenormand@inrae.fr)

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)

comnet <- mat_to_net(comat)

dissim <- dissimilarity(comat, metric = "all")

clust1 <- nhclu_kmeans(dissim, n_clust = 2:10, index = "Simpson")
clust2 <- nhclu_kmeans(dissim, n_clust = 2:15, index = "Simpson")
partition_metrics(clust2, dissimilarity = dissim,
                  eval_metric = "pc_distance")

partition_metrics(clust2, net = comnet, species_col = "Node2",
                  site_col = "Node1", eval_metric = "avg_endemism")

[Package bioregion version 1.1.1 Index]