hash {insect}R Documentation

Convert sequences to MD5 hashes.

Description

This function converts DNA or amino acid sequences to 128-bit MD5 hash values for efficient duplicate identification and dereplication.

Usage

hash(x)

Arguments

x

a sequence or list of sequences, either in character string, character vector, or raw byte format (eg DNAbin or AAbin objects).

Details

This function uses the md5 function from the openSSL library (https://www.openssl.org/) to digest sequences to 128-bit hashes. These can be compared using base functions such as duplicated and unique, for fast identification and management of duplicate sequences in large datasets.

Value

a character vector.

Author(s)

Shaun Wilkinson

References

Ooms J (2017) openssl: toolkit for encryption, signatures and certificates based on OpenSSL. R package version 0.9.7. https://CRAN.R-project.org/package=openssl

Examples

 data(whales)
 hashes <- hash(whales)
 sum(duplicated(hashes))

[Package insect version 1.4.2 Index]