Text Tokenization using Byte Pair Encoding and Unigram Modelling


[Up] [Top]

Documentation for package ‘sentencepiece’ version 0.2.3

Help Pages

BPEembed Tokenise and embed text alongside a Sentencepiece and Word2vec model
BPEembedder Build a BPEembed model containing a Sentencepiece and Word2vec model
predict.BPEembed Encode and Decode alongside a BPEembed model
read_word2vec Read a word2vec embedding file
sentencepiece Construct a Sentencepiece model
sentencepiece_decode Decode encoded sequences back to text
sentencepiece_download_model Download a Sentencepiece model
sentencepiece_encode Tokenise text alongside a Sentencepiece model
sentencepiece_load_model Load a Sentencepiece model
txt_remove_ Remove prefixed underscore
wordpiece_encode Wordpiece encoding