crf_evaluation {crfsuite} | R Documentation |
Basic classification evaluation metrics for multi-class labelling
Description
The accuracy, precision, recall, specificity, F1 measure and support metrics are provided for each label in a one-versus the rest setting.
Usage
crf_evaluation(
pred,
obs,
labels = na.exclude(unique(c(as.character(pred), as.character(obs)))),
labels_overall = setdiff(labels, "O")
)
Arguments
pred |
a factor with predictions |
obs |
a factor with gold labels |
labels |
a character vector of possible values that |
labels_overall |
a character vector of either labels which is either the same as |
Value
a list with 2 elements:
bylabel: data.frame with the accuracy, precision, recall, specificity, F1 score and support (number of occurrences) for each label
overall: a vector containing
the overall accuracy
the metrics precision, recall, specificity and F1 score which are weighted averages of these metrics from list element
bylabel
, where the weight is the supportthe metrics precision, recall, specificity and F1 score which are averages of these metrics from list element
bylabel
giving equal weight to each label
Examples
pred <- sample(LETTERS, 1000, replace = TRUE)
gold <- sample(LETTERS, 1000, replace = TRUE)
crf_evaluation(pred = pred, obs = gold, labels = LETTERS)
x <- ner_download_modeldata("conll2002-nl")
x <- crf_cbind_attributes(x, terms = c("token", "pos"),
by = c("doc_id", "sentence_id"))
crf_train <- subset(x, data == "ned.train")
crf_test <- subset(x, data == "testa")
attributes <- grep("token|pos", colnames(x), value=TRUE)
model <- crf(y = crf_train$label,
x = crf_train[, attributes],
group = crf_train$doc_id,
method = "lbfgs")
## Use the model to score on existing tokenised data
scores <- predict(model,
newdata = crf_test[, attributes],
group = crf_test$doc_id)
crf_evaluation(pred = scores$label, obs = crf_test$label)
crf_evaluation(pred = scores$label, obs = crf_test$label,
labels = c("O",
"B-ORG", "I-ORG", "B-PER", "I-PER",
"B-LOC", "I-LOC", "B-MISC", "I-MISC"))
library(udpipe)
pred <- txt_recode(scores$label,
from = c("B-ORG", "I-ORG", "B-PER", "I-PER",
"B-LOC", "I-LOC", "B-MISC", "I-MISC"),
to = c("ORG", "ORG", "PER", "PER",
"LOC", "LOC", "MISC", "MISC"))
obs <- txt_recode(crf_test$label,
from = c("B-ORG", "I-ORG", "B-PER", "I-PER",
"B-LOC", "I-LOC", "B-MISC", "I-MISC"),
to = c("ORG", "ORG", "PER", "PER",
"LOC", "LOC", "MISC", "MISC"))
crf_evaluation(pred = pred, obs = obs,
labels = c("ORG", "LOC", "PER", "MISC", "O"))