Fast Text Tokenization


[Up] [Top]

Documentation for package ‘tok’ version 0.1.3

Help Pages

tok-package tok: Fast Text Tokenization
decoder_byte_level Byte level decoder
encoding Encoding
model_bpe BPE model
model_unigram An implementation of the Unigram algorithm
model_wordpiece An implementation of the WordPiece algorithm
normalizer_nfc NFC normalizer
normalizer_nfkc NFKC normalizer
pre_tokenizer Generic class for tokenizers
pre_tokenizer_byte_level Byte level pre tokenizer
pre_tokenizer_whitespace This pre-tokenizer simply splits using the following regex: \w+|[^\w\s]+
processor_byte_level Byte Level post processor
tok tok: Fast Text Tokenization
tokenizer Tokenizer
tok_decoder Generic class for decoders
tok_model Generic class for tokenization models
tok_normalizer Generic class for normalizers
tok_processor Generic class for processors
tok_trainer Generic training class
trainer_bpe BPE trainer
trainer_unigram Unigram tokenizer trainer
trainer_wordpiece WordPiece tokenizer trainer