cTest {koRpus} | R Documentation |
Transform text into C-Test-like format
Description
If you feed a tagged text object to this function, its text will be transformed into a format used for C-Tests:
the first and last sentence will be left untouched (except if the
start
andstop
values of theintact
parameter are changedof all other sentences, the second half of every 2nd word (or as specified by
every
) will be replaced by a linewords must have at least
min.length
characters, otherwise they are skippedwords an uneven number of characters will be replaced after the next character, i.e., a word with five characters will keep the first three and have the last two replaced
Usage
cTest(obj, ...)
## S4 method for signature 'kRp.text'
cTest(
obj,
every = 2,
min.length = 3,
intact = c(start = 1, end = 1),
replace.by = "_"
)
Arguments
obj |
An object of class |
... |
Additional arguments to the method (as described in this document). |
every |
Integer numeric, setting the frequency of words to be manipulated. By default, every other word is being transformed. |
min.length |
Integer numeric, sets the minimum length of words to be considered (in letters). |
intact |
Named vector with the elements |
replace.by |
Character, will be used as the replacement for the removed word halves. |
Value
An object of class kRp.text
with the added feature diff
.
Examples
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
tokenized.obj <- cTest(tokenized.obj)
pasteText(tokenized.obj)
# diff stats are now part of the object
hasFeature(tokenized.obj)
diffText(tokenized.obj)
} else {}