trainer_bpe {tok} | R Documentation |
BPE trainer
Description
BPE trainer
BPE trainer
Super class
tok::tok_trainer
-> tok_trainer_bpe
Methods
Public methods
Method new()
Constrcutor for the BPE trainer
Usage
trainer_bpe$new( vocab_size = NULL, min_frequency = NULL, show_progress = NULL, special_tokens = NULL, limit_alphabet = NULL, initial_alphabet = NULL, continuing_subword_prefix = NULL, end_of_word_suffix = NULL, max_token_length = NULL )
Arguments
vocab_size
The size of the final vocabulary, including all tokens and alphabet. Default:
NULL
.min_frequency
The minimum frequency a pair should have in order to be merged. Default:
NULL
.show_progress
Whether to show progress bars while training. Default:
TRUE
.special_tokens
A list of special tokens the model should be aware of. Default:
NULL
.limit_alphabet
The maximum number of different characters to keep in the alphabet. Default:
NULL
.initial_alphabet
A list of characters to include in the initial alphabet, even if not seen in the training dataset. Default:
NULL
.continuing_subword_prefix
A prefix to be used for every subword that is not a beginning-of-word. Default:
NULL
.end_of_word_suffix
A suffix to be used for every subword that is an end-of-word. Default:
NULL
.max_token_length
Prevents creating tokens longer than the specified size. Default:
NULL
.
Method clone()
The objects of this class are cloneable with this method.
Usage
trainer_bpe$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
Other trainer:
tok_trainer
,
trainer_unigram
,
trainer_wordpiece