tok-package {tok} | R Documentation |
tok: Fast Text Tokenization
Description
Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm https://huggingface.co/docs/tokenizers/index. It's extremely fast for both training new vocabularies and tokenizing texts.
Author(s)
Maintainer: Daniel Falbel daniel@posit.co
Other contributors:
Posit [copyright holder]
See Also
Useful links:
[Package tok version 0.1.3 Index]