R: An implementation of the WordPiece algorithm

model_wordpiece {tok}

R Documentation

An implementation of the WordPiece algorithm

An implementation of the WordPiece algorithm

tok::tok_model -> tok_model_wordpiece

Constructor for the wordpiece tokenizer

model_wordpiece$new(
  vocab = NULL,
  unk_token = NULL,
  max_input_chars_per_word = NULL
)

vocab: A dictionary of string keys and their corresponding ids. Default: NULL.
unk_token: The unknown token to be used by the model. Default: NULL.
max_input_chars_per_word: The maximum number of characters to allow in a single word. Default: NULL.

The objects of this class are cloneable with this method.

model_wordpiece$clone(deep = FALSE)