bpe_load_model {tokenizers.bpe}R Documentation

Load a Byte Pair Encoding model

Description

Load a Byte Pair Encoding model trained with bpe

Usage

bpe_load_model(file, threads = -1L)

Arguments

file

path to the model

threads

integer with number of CPU threads to use for model processing. If equal to -1 then minimum of the number of available threads and 8 will be used

Value

an object of class youtokentome which is a list with elements

  1. model: an Rcpp pointer to the model

  2. model_path: the path to the model

  3. threads: the threads argument

  4. vocab_size: the size of the BPE vocabulary

  5. vocabulary: the BPE vocabulary with is a data.frame with columns id and subword

Examples

## Reload a model
path  <- system.file(package = "tokenizers.bpe", "extdata", "youtokentome.bpe")
model <- bpe_load_model(path)

## Build a model and load it again

data(belgium_parliament, package = "tokenizers.bpe")
x <- subset(belgium_parliament, language == "french")
model <- bpe(x$text, coverage = 0.999, vocab_size = 5000, threads = 1)
model <- bpe_load_model(model$model_path, threads = 1)

## Remove the model file (Clean up for CRAN)
file.remove(model$model_path)

[Package tokenizers.bpe version 0.1.3 Index]