split_by_doc_id,kRp.corpus-method {tm.plugin.koRpus} | R Documentation |
Turn a kRp.corpus object into a list of kRp.text objects
Description
For some analysis steps it might be important to have individual tagged texts instead of one large corpus object. This method produces just that.
Usage
## S4 method for signature 'kRp.corpus'
split_by_doc_id(obj, keepFeatures = TRUE)
Arguments
obj |
An object of class |
keepFeatures |
Either logical, whether to keep all features or drop them, or a character vector of names of features to keep if present. |
Value
A named list of objects of class kRp.text
.
Elements are named by their doc_id
.
Examples
# use readCorpus() to create an object of class kRp.corpus
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
myCorpus <- readCorpus(
dir=file.path(path.package("tm.plugin.koRpus"), "examples", "corpus"),
hierarchy=list(
Topic=c(
Winner="Reality Winner",
Edwards="Natalie Edwards"
),
Source=c(
Wikipedia_prev="Wikipedia (old)",
Wikipedia_new="Wikipedia (new)"
)
),
# use tokenize() so examples run without a TreeTagger installation
tagger="tokenize",
lang="en"
)
myCorpusList <- split_by_doc_id(myCorpus)
} else {}
[Package tm.plugin.koRpus version 0.4-2 Index]