find_clusters {biosurvey} | R Documentation |
Finds clusters of data in two dimensions based on distinct methods.
find_clusters(data, x_column, y_column, space,
cluster_method = "hierarchical", n_k_means = NULL,
split_distance = NULL)
data |
matrix or data.frame that contains at least two columns. |
x_column |
(character) the name of the x-axis. |
y_column |
(character) the name of the y-axis. |
space |
(character) space in which the thinning will be performed. There are two options available: "G", if it will be in the geographic space, and "E", if it will be in the environmental space. |
cluster_method |
(character) name of the method to be used for detecting clusters. Options are "hierarchical" and "k-means"; default = "hierarchical". |
n_k_means |
(numeric) number of clusters to be identified when using the
"k-means" in |
split_distance |
(numeric) distance in meters (if |
Clustering methods make distinct assumptions and one of them may perform better than the other depending on the pattern of the data.
The k-means method tends to perform better when data are grouped spatially (spherically) and clusters are of a similar size. The hierarchical clustering algorithm usually takes more time than the k-means method. Both methods make assumptions and may work well on some data sets but fail on others.
A data frame containing data
and an additional column defining
clusters.
# Data
data("m_matrix", package = "biosurvey")
# Cluster detection
clusters <- find_clusters(m_matrix$data_matrix, x_column = "PC1",
y_column = "PC2", space = "E",
cluster_method = "hierarchical", n_k_means = NULL,
split_distance = 4)
head(clusters)