| trainer_bpe {tok} | R Documentation |
BPE trainer
Description
BPE trainer
BPE trainer
Super class
tok::tok_trainer -> tok_trainer_bpe
Methods
Public methods
Method new()
Constrcutor for the BPE trainer
Usage
trainer_bpe$new( vocab_size = NULL, min_frequency = NULL, show_progress = NULL, special_tokens = NULL, limit_alphabet = NULL, initial_alphabet = NULL, continuing_subword_prefix = NULL, end_of_word_suffix = NULL, max_token_length = NULL )
Arguments
vocab_sizeThe size of the final vocabulary, including all tokens and alphabet. Default:
NULL.min_frequencyThe minimum frequency a pair should have in order to be merged. Default:
NULL.show_progressWhether to show progress bars while training. Default:
TRUE.special_tokensA list of special tokens the model should be aware of. Default:
NULL.limit_alphabetThe maximum number of different characters to keep in the alphabet. Default:
NULL.initial_alphabetA list of characters to include in the initial alphabet, even if not seen in the training dataset. Default:
NULL.continuing_subword_prefixA prefix to be used for every subword that is not a beginning-of-word. Default:
NULL.end_of_word_suffixA suffix to be used for every subword that is an end-of-word. Default:
NULL.max_token_lengthPrevents creating tokens longer than the specified size. Default:
NULL.
Method clone()
The objects of this class are cloneable with this method.
Usage
trainer_bpe$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Other trainer:
tok_trainer,
trainer_unigram,
trainer_wordpiece