R: Predict the label sequence based on the Conditional Random...

predict.crf {crfsuite}

R Documentation

Predict the label sequence based on the Conditional Random Field

Description

Predict the label sequence based on the Conditional Random Field

Usage

## S3 method for class 'crf'
predict(
  object,
  newdata,
  embeddings,
  group,
  type = c("marginal", "sequence"),
  trace = FALSE,
  ...
)

Arguments

`object`	an object of class crf as returned by `crf`
`newdata`	a character matrix of data containing attributes about the label sequence `y` or an object which can be coerced to a character matrix. This data should be provided in the same format as was used for training the model
`embeddings`	a matrix with the same number of rows as `x` and in the same order with numeric information used to predict
`group`	an integer or character vector of the same length as nrow `newdata` indicating the group the sequence `y` belongs to (e.g. a document or sentence identifier)
`type`	either 'marginal' or 'sequence' to get predictions at the level of `newdata` or a the level of the sequence `group`. Defaults to `'marginal'`
`trace`	a logical indicating to show the trace of the labelling output. Defaults to `FALSE`.
`...`	not used

Value

If type is 'marginal': a data.frame with columns label and marginal containing the viterbi decoded predicted label and marginal probability.
If type is 'sequence': a data.frame with columns group and probability containing for each sequence group the probability of the sequence.

Examples



library(udpipe)
data(airbnb_chunks, package = "crfsuite")
udmodel <- udpipe_download_model("dutch-lassysmall")
udmodel <- udpipe_load_model(udmodel$file_model)
airbnb_tokens <- unique(airbnb_chunks[, c("doc_id", "text")])
airbnb_tokens <- udpipe_annotate(udmodel, 
                                 x = airbnb_tokens$text, 
                                 doc_id = airbnb_tokens$doc_id)
airbnb_tokens <- as.data.frame(airbnb_tokens)
x <- merge(airbnb_chunks, airbnb_tokens)
x <- crf_cbind_attributes(x, terms = c("upos", "lemma"), by = "doc_id")
model <- crf(y = x$chunk_entity, 
             x = x[, grep("upos|lemma", colnames(x))], 
             group = x$doc_id, 
             method = "lbfgs", options = list(max_iterations = 5)) 
scores <- predict(model, 
                  newdata = x[, grep("upos|lemma", colnames(x))], 
                  group = x$doc_id, type = "marginal")
head(scores)
scores <- predict(model, 
                  newdata = x[, grep("upos|lemma", colnames(x))], 
                  group = x$doc_id, type = "sequence")
head(scores)


## cleanup for CRAN
file.remove(model$file_model)
file.remove("modeldetails.txt")
file.remove(udmodel$file)

[Package crfsuite version 0.4.2 Index]

Predict the label sequence based on the Conditional Random Field

Description

Usage

Arguments

Value

See Also

Examples