bleu_sentence {sacRebleu}R Documentation

Compute BLEU for a Sentence with Tokenization

Description

This function applies tokenization based on the 'tok' library and computes the BLEU score. An already initializied tokenizer can be provided using the 'tokenizer' argument or a valid huggingface identifier (string) can be passed. If the identifier is used only, the tokenizer is newly initialized on every call.

Usage

bleu_sentence(
  references,
  candidate,
  tokenizer = "bert-base-cased",
  n = 4,
  weights = NULL,
  smoothing = NULL,
  epsilon = 0.1,
  k = 1
)

Arguments

references

A list of reference sentences.

candidate

A candidate sentence.

tokenizer

Either an already initialized 'tok' tokenizer object or a huggingface identifier (default is 'bert-base-cased')

n

N-gram for BLEU score (default is set to 4).

weights

Weights for the n-grams (default is set to 1/n for each entry).

smoothing

Smoothing method for BLEU score (default is set to 'standard', 'floor', 'add-k' available)

epsilon

Epsilon value for epsilon-smoothing (default is set to 0.1).

k

K value for add-k-smoothing (default is set to 1).

Value

The BLEU score for the candidate sentence.

Examples

cand <- "Hello World!"
ref <- list("Hello everyone.", "Hello Planet", "Hello World")

tok <- tok::tokenizer$from_pretrained("bert-base-uncased")
bleu_standard <- bleu_sentence(ref, cand, tok)

[Package sacRebleu version 0.1.3 Index]