tok-package {tok}R Documentation

tok: Fast Text Tokenization

Description

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm https://huggingface.co/docs/tokenizers/index. It's extremely fast for both training new vocabularies and tokenizing texts.

Author(s)

Maintainer: Daniel Falbel daniel@posit.co

Other contributors:

See Also

Useful links:


[Package tok version 0.1.3 Index]