tok-package | tok: Fast Text Tokenization |
decoder_byte_level | Byte level decoder |
encoding | Encoding |
model_bpe | BPE model |
model_unigram | An implementation of the Unigram algorithm |
model_wordpiece | An implementation of the WordPiece algorithm |
normalizer_nfc | NFC normalizer |
normalizer_nfkc | NFKC normalizer |
pre_tokenizer | Generic class for tokenizers |
pre_tokenizer_byte_level | Byte level pre tokenizer |
pre_tokenizer_whitespace | This pre-tokenizer simply splits using the following regex: \w+|[^\w\s]+ |
processor_byte_level | Byte Level post processor |
tok | tok: Fast Text Tokenization |
tokenizer | Tokenizer |
tok_decoder | Generic class for decoders |
tok_model | Generic class for tokenization models |
tok_normalizer | Generic class for normalizers |
tok_processor | Generic class for processors |
tok_trainer | Generic training class |
trainer_bpe | BPE trainer |
trainer_unigram | Unigram tokenizer trainer |
trainer_wordpiece | WordPiece tokenizer trainer |