R: Compute BLEU for a Sentence with Tokenization

bleu_sentence {sacRebleu}

R Documentation

Compute BLEU for a Sentence with Tokenization

Description

This function applies tokenization based on the 'tok' library and computes the BLEU score. An already initializied tokenizer can be provided using the 'tokenizer' argument or a valid huggingface identifier (string) can be passed. If the identifier is used only, the tokenizer is newly initialized on every call.

Usage

bleu_sentence(
  references,
  candidate,
  tokenizer = "bert-base-cased",
  n = 4,
  weights = NULL,
  smoothing = NULL,
  epsilon = 0.1,
  k = 1
)

Arguments

`references`	A list of reference sentences.
`candidate`	A candidate sentence.
`tokenizer`	Either an already initialized 'tok' tokenizer object or a huggingface identifier (default is 'bert-base-cased')
`n`	N-gram for BLEU score (default is set to 4).
`weights`	Weights for the n-grams (default is set to 1/n for each entry).
`smoothing`	Smoothing method for BLEU score (default is set to 'standard', 'floor', 'add-k' available)
`epsilon`	Epsilon value for epsilon-smoothing (default is set to 0.1).
`k`	K value for add-k-smoothing (default is set to 1).

Value

The BLEU score for the candidate sentence.

Examples

cand <- "Hello World!"
ref <- list("Hello everyone.", "Hello Planet", "Hello World")

tok <- tok::tokenizer$from_pretrained("bert-base-uncased")
bleu_standard <- bleu_sentence(ref, cand, tok)

[Package sacRebleu version 0.1.3 Index]