correct.tag {koRpus} | R Documentation |
Methods to correct koRpus objects
Description
The method correct.tag
can be used to alter objects of class kRp.text
.
Usage
correct.tag(
obj,
row,
tag = NULL,
lemma = NULL,
check.token = NULL,
quiet = TRUE
)
## S4 method for signature 'kRp.text'
correct.tag(
obj,
row,
tag = NULL,
lemma = NULL,
check.token = NULL,
quiet = TRUE
)
Arguments
obj |
An object of class |
row |
Integer, the row number of the entry to be changed. Can be an integer vector to change several rows in one go. |
tag |
A character string with a valid POS tag to replace the current tag entry.
If |
lemma |
A character string naming the lemma to to replace the current lemma entry.
If |
check.token |
A character string naming the token you expect to be in this row.
If not |
quiet |
If |
Details
Although automatic POS tagging and lemmatization are remarkably accurate, the algorithms do ususally produce some errors. If you want to correct for these flaws, this method can be of help, because it might prevent you from introducing new errors. That is, it will do some sanitiy checks before the object is actually manipulated and returned.
correct.tag
will read the lang
slot from the given object and check whether the tag
provided is actually valid. If so,
it will not only change the tag
field in the object, but also update
wclass
and desc
accordingly.
If check.token
is set it must also match token
in the given row(s). Note that no check is done on the lemmata.
Value
An object of the same class as obj
.
See Also
kRp.text
, treetag
,
kRp.POS.tags
.
Examples
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
tokenized.obj <- correct.tag(tokenized.obj, row=6, tag="NN")
} else {}