bleu_sentence {sacRebleu} | R Documentation |
Compute BLEU for a Sentence with Tokenization
Description
This function applies tokenization based on the 'tok' library and computes the BLEU score. An already initializied tokenizer can be provided using the 'tokenizer' argument or a valid huggingface identifier (string) can be passed. If the identifier is used only, the tokenizer is newly initialized on every call.
Usage
bleu_sentence(
references,
candidate,
tokenizer = "bert-base-cased",
n = 4,
weights = NULL,
smoothing = NULL,
epsilon = 0.1,
k = 1
)
Arguments
references |
A list of reference sentences. |
candidate |
A candidate sentence. |
tokenizer |
Either an already initialized 'tok' tokenizer object or a huggingface identifier (default is 'bert-base-cased') |
n |
N-gram for BLEU score (default is set to 4). |
weights |
Weights for the n-grams (default is set to 1/n for each entry). |
smoothing |
Smoothing method for BLEU score (default is set to 'standard', 'floor', 'add-k' available) |
epsilon |
Epsilon value for epsilon-smoothing (default is set to 0.1). |
k |
K value for add-k-smoothing (default is set to 1). |
Value
The BLEU score for the candidate sentence.
Examples
cand <- "Hello World!"
ref <- list("Hello everyone.", "Hello Planet", "Hello World")
tok <- tok::tokenizer$from_pretrained("bert-base-uncased")
bleu_standard <- bleu_sentence(ref, cand, tok)