bpe_load_model {tokenizers.bpe} | R Documentation |
Load a Byte Pair Encoding model
Description
Load a Byte Pair Encoding model trained with bpe
Usage
bpe_load_model(file, threads = -1L)
Arguments
file |
path to the model |
threads |
integer with number of CPU threads to use for model processing. If equal to -1 then minimum of the number of available threads and 8 will be used |
Value
an object of class youtokentome
which is a list with elements
model: an Rcpp pointer to the model
model_path: the path to the model
threads: the threads argument
vocab_size: the size of the BPE vocabulary
vocabulary: the BPE vocabulary with is a data.frame with columns id and subword
Examples
## Reload a model
path <- system.file(package = "tokenizers.bpe", "extdata", "youtokentome.bpe")
model <- bpe_load_model(path)
## Build a model and load it again
data(belgium_parliament, package = "tokenizers.bpe")
x <- subset(belgium_parliament, language == "french")
model <- bpe(x$text, coverage = 0.999, vocab_size = 5000, threads = 1)
model <- bpe_load_model(model$model_path, threads = 1)
## Remove the model file (Clean up for CRAN)
file.remove(model$model_path)
[Package tokenizers.bpe version 0.1.3 Index]