R: Decode Byte Pair Encoding sequences to text

bpe_decode {tokenizers.bpe}

R Documentation

Decode Byte Pair Encoding sequences to text

Description

Decode a sequence of Byte Pair Encoding ids into text again

Usage

bpe_decode(model, x, ...)

Arguments

`model`	an object of class `youtokentome` as returned by `bpe_load_model`
`x`	an integer vector of BPE id's
`...`	further arguments passed on to youtokentome_encode_as_ids

Examples

data(belgium_parliament, package = "tokenizers.bpe")
x <- subset(belgium_parliament, language == "french")
model <- bpe(x$text, coverage = 0.999, vocab_size = 5000, threads = 1)
model
str(model$vocabulary)

text <- c("L'appartement est grand & vraiment bien situe en plein centre",
          "Proportion de femmes dans les situations de famille monoparentale.")
bpe_encode(model, x = text, type = "subwords")
bpe_encode(model, x = text, type = "ids")

encoded <- bpe_encode(model, x = text, type = "ids")
decoded <- bpe_decode(model, encoded)
decoded

## Remove the model file (Clean up for CRAN)
file.remove(model$model_path)

[Package tokenizers.bpe version 0.1.3 Index]