tok-package |
tok: Fast Text Tokenization |
decoder_byte_level |
Byte level decoder |
encoding |
Encoding |
model_bpe |
BPE model |
model_unigram |
An implementation of the Unigram algorithm |
model_wordpiece |
An implementation of the WordPiece algorithm |
normalizer_nfc |
NFC normalizer |
normalizer_nfkc |
NFKC normalizer |
pre_tokenizer |
Generic class for tokenizers |
pre_tokenizer_byte_level |
Byte level pre tokenizer |
pre_tokenizer_whitespace |
This pre-tokenizer simply splits using the following regex: \w+|[^\w\s]+ |
processor_byte_level |
Byte Level post processor |
tok |
tok: Fast Text Tokenization |
tokenizer |
Tokenizer |
tok_decoder |
Generic class for decoders |
tok_model |
Generic class for tokenization models |
tok_normalizer |
Generic class for normalizers |
tok_processor |
Generic class for processors |
tok_trainer |
Generic training class |
trainer_bpe |
BPE trainer |
trainer_unigram |
Unigram tokenizer trainer |
trainer_wordpiece |
WordPiece tokenizer trainer |