R: Methods to correct koRpus objects

correct.tag {koRpus}

R Documentation

Methods to correct koRpus objects

Description

The method correct.tag can be used to alter objects of class kRp.text.

Usage

correct.tag(
  obj,
  row,
  tag = NULL,
  lemma = NULL,
  check.token = NULL,
  quiet = TRUE
)

## S4 method for signature 'kRp.text'
correct.tag(
  obj,
  row,
  tag = NULL,
  lemma = NULL,
  check.token = NULL,
  quiet = TRUE
)

Arguments

`obj`	An object of class `kRp.text`.
`row`	Integer, the row number of the entry to be changed. Can be an integer vector to change several rows in one go.
`tag`	A character string with a valid POS tag to replace the current tag entry. If `NULL` (the default) the entry remains unchanged.
`lemma`	A character string naming the lemma to to replace the current lemma entry. If `NULL` (the default) the entry remains unchanged.
`check.token`	A character string naming the token you expect to be in this row. If not `NULL`, `correct` will stop with an error if this values don't match.
`quiet`	If `FALSE`, messages about all applied changes are shown.

Details

Although automatic POS tagging and lemmatization are remarkably accurate, the algorithms do ususally produce some errors. If you want to correct for these flaws, this method can be of help, because it might prevent you from introducing new errors. That is, it will do some sanitiy checks before the object is actually manipulated and returned.

correct.tag will read the lang slot from the given object and check whether the tag provided is actually valid. If so, it will not only change the tag field in the object, but also update wclass and desc accordingly.

If check.token is set it must also match token in the given row(s). Note that no check is done on the lemmata.

Value

An object of the same class as obj.

Examples

# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )
  tokenized.obj <- correct.tag(tokenized.obj, row=6, tag="NN")
} else {}

[Package koRpus version 0.13-8 Index]