find_clusters {biosurvey}R Documentation

Detection of clusters in 2D spaces

Description

Finds clusters of data in two dimensions based on distinct methods.

Usage

find_clusters(data, x_column, y_column, space,
              cluster_method = "hierarchical", n_k_means = NULL,
              split_distance = NULL)

Arguments

data

matrix or data.frame that contains at least two columns.

x_column

(character) the name of the x-axis.

y_column

(character) the name of the y-axis.

space

(character) space in which the thinning will be performed. There are two options available: "G", if it will be in the geographic space, and "E", if it will be in the environmental space.

cluster_method

(character) name of the method to be used for detecting clusters. Options are "hierarchical" and "k-means"; default = "hierarchical".

n_k_means

(numeric) number of clusters to be identified when using the "k-means" in cluster_method.

split_distance

(numeric) distance in meters (if space = "G") or Euclidean distance (if space = "E") to identify clusters if cluster_method = "hierarchical".

Details

Clustering methods make distinct assumptions and one of them may perform better than the other depending on the pattern of the data.

The k-means method tends to perform better when data are grouped spatially (spherically) and clusters are of a similar size. The hierarchical clustering algorithm usually takes more time than the k-means method. Both methods make assumptions and may work well on some data sets but fail on others.

Value

A data frame containing data and an additional column defining clusters.

Examples

# Data
data("m_matrix", package = "biosurvey")

# Cluster detection
clusters <-  find_clusters(m_matrix$data_matrix, x_column = "PC1",
                           y_column = "PC2", space = "E",
                           cluster_method = "hierarchical", n_k_means = NULL,
                           split_distance = 4)
head(clusters)

[Package biosurvey version 0.1.1 Index]