delineate_with_similarity {maldipickr}R Documentation

Delineate clusters from a similarity matrix

Description

From a matrix of spectra similarity (e.g., with the cosine metric, or Pearson product moment), infer the species clusters based on a threshold above (or equal to) which spectra are considered alike.

Usage

delineate_with_similarity(sim_matrix, threshold, method = "complete")

Arguments

sim_matrix

A n \times n similarity matrix, with n the number of spectra. Columns should be named as the rows.

threshold

A numeric value indicating the minimal similarity between two spectra. Adjust accordingly to the similarity metric used.

method

The method of hierarchical clustering to use. The default and recommended method is "complete", but any methods from stats::hclust are valid.

Details

The similarity matrix is converted to a distance matrix by subtracting the value one. This approach works for cosine similarity and positive correlations that have an upper bound of 1. Clusters are then delineated using hierarchical clustering. The default method of hierarchical clustering is the complete linkage (also known as farthest neighbor clustering) to ensure that the within-group minimum similarity of each cluster respects the threshold. See the Details section of stats::hclust for others valid methods to use.

Value

A tibble of n rows for each spectra and 3 columns:

See Also

For similarity metrics: coop::tcosine, stats::cor, Hmisc::rcorr. For using taxonomic identifications for clusters : delineate_with_identification. For further analyses: set_reference_spectra.

Examples

# Toy similarity matrix between the six example spectra of
#  three species. The cosine metric is used and a value of
#  zero indicates dissimilar spectra and a value of one
#  indicates identical spectra.
cosine_similarity <- matrix(
  c(
    1, 0.79, 0.77, 0.99, 0.98, 0.98,
    0.79, 1, 0.98, 0.79, 0.8, 0.8,
    0.77, 0.98, 1, 0.77, 0.77, 0.77,
    0.99, 0.79, 0.77, 1, 1, 0.99,
    0.98, 0.8, 0.77, 1, 1, 1,
    0.98, 0.8, 0.77, 0.99, 1, 1
  ),
  nrow = 6,
  dimnames = list(
    c(
      "species1_G2", "species2_E11", "species2_E12",
      "species3_F7", "species3_F8", "species3_F9"
    ),
    c(
      "species1_G2", "species2_E11", "species2_E12",
      "species3_F7", "species3_F8", "species3_F9"
    )
  )
)
# Delineate clusters based on a 0.92 threshold applied
#  to the similarity matrix
delineate_with_similarity(cosine_similarity, threshold = 0.92)

[Package maldipickr version 1.3.0 Index]