model_bpe {tok}R Documentation

BPE model

Description

BPE model

BPE model

Super class

tok::tok_model -> tok_model_bpe

Methods

Public methods


Method new()

Initializes a BPE model An implementation of the BPE (Byte-Pair Encoding) algorithm

Usage
model_bpe$new(
  vocab = NULL,
  merges = NULL,
  cache_capacity = NULL,
  dropout = NULL,
  unk_token = NULL,
  continuing_subword_prefix = NULL,
  end_of_word_suffix = NULL,
  fuse_unk = NULL,
  byte_fallback = FALSE
)
Arguments
vocab

A named integer vector of string keys and their corresponding ids. Default: NULL

merges

A list of pairs of tokens (⁠[character, character]⁠). Default: NULL.

cache_capacity

The number of words that the BPE cache can contain. The cache speeds up the process by storing merge operation results. Default: NULL.

dropout

A float between 0 and 1 representing the BPE dropout to use. Default: NULL

unk_token

The unknown token to be used by the model. Default: 'NULL“'.

continuing_subword_prefix

The prefix to attach to subword units that don’t represent the beginning of a word. Default: NULL

end_of_word_suffix

The suffix to attach to subword units that represent the end of a word. Default: NULL

fuse_unk

Whether to fuse any subsequent unknown tokens into a single one. Default: NULL.

byte_fallback

Whether to use the spm byte-fallback trick. Default: FALSE.


Method clone()

The objects of this class are cloneable with this method.

Usage
model_bpe$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other model: model_unigram, model_wordpiece, tok_model


[Package tok version 0.1.3 Index]