hash {insect} | R Documentation |
Convert sequences to MD5 hashes.
Description
This function converts DNA or amino acid sequences to 128-bit MD5 hash values for efficient duplicate identification and dereplication.
Usage
hash(x)
Arguments
x |
a sequence or list of sequences, either in character string, character vector, or raw byte format (eg DNAbin or AAbin objects). |
Details
This function uses the md5
function from the openSSL library
(https://www.openssl.org/)
to digest sequences to 128-bit hashes.
These can be compared using base functions such
as duplicated
and unique
, for fast identification and
management of duplicate sequences in large datasets.
Value
a character vector.
Author(s)
Shaun Wilkinson
References
Ooms J (2017) openssl: toolkit for encryption, signatures and certificates based on OpenSSL. R package version 0.9.7. https://CRAN.R-project.org/package=openssl
Examples
data(whales)
hashes <- hash(whales)
sum(duplicated(hashes))
[Package insect version 1.4.2 Index]