trainer_bpe {tok}R Documentation

BPE trainer

Description

BPE trainer

BPE trainer

Super class

tok::tok_trainer -> tok_trainer_bpe

Methods

Public methods


Method new()

Constrcutor for the BPE trainer

Usage
trainer_bpe$new(
  vocab_size = NULL,
  min_frequency = NULL,
  show_progress = NULL,
  special_tokens = NULL,
  limit_alphabet = NULL,
  initial_alphabet = NULL,
  continuing_subword_prefix = NULL,
  end_of_word_suffix = NULL,
  max_token_length = NULL
)
Arguments
vocab_size

The size of the final vocabulary, including all tokens and alphabet. Default: NULL.

min_frequency

The minimum frequency a pair should have in order to be merged. Default: NULL.

show_progress

Whether to show progress bars while training. Default: TRUE.

special_tokens

A list of special tokens the model should be aware of. Default: NULL.

limit_alphabet

The maximum number of different characters to keep in the alphabet. Default: NULL.

initial_alphabet

A list of characters to include in the initial alphabet, even if not seen in the training dataset. Default: NULL.

continuing_subword_prefix

A prefix to be used for every subword that is not a beginning-of-word. Default: NULL.

end_of_word_suffix

A suffix to be used for every subword that is an end-of-word. Default: NULL.

max_token_length

Prevents creating tokens longer than the specified size. Default: NULL.


Method clone()

The objects of this class are cloneable with this method.

Usage
trainer_bpe$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other trainer: tok_trainer, trainer_unigram, trainer_wordpiece


[Package tok version 0.1.3 Index]