split_by_doc_id,kRp.corpus-method {tm.plugin.koRpus}R Documentation

Turn a kRp.corpus object into a list of kRp.text objects

Description

For some analysis steps it might be important to have individual tagged texts instead of one large corpus object. This method produces just that.

Usage

## S4 method for signature 'kRp.corpus'
split_by_doc_id(obj, keepFeatures = TRUE)

Arguments

obj

An object of class kRp.corpus.

keepFeatures

Either logical, whether to keep all features or drop them, or a character vector of names of features to keep if present.

Value

A named list of objects of class kRp.text. Elements are named by their doc_id.

Examples

# use readCorpus() to create an object of class kRp.corpus
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  myCorpus <- readCorpus(
    dir=file.path(path.package("tm.plugin.koRpus"), "examples", "corpus"),
    hierarchy=list(
      Topic=c(
        Winner="Reality Winner",
        Edwards="Natalie Edwards"
      ),
      Source=c(
        Wikipedia_prev="Wikipedia (old)",
        Wikipedia_new="Wikipedia (new)"
      )
    ),
    # use tokenize() so examples run without a TreeTagger installation
    tagger="tokenize",
    lang="en"
  )

  myCorpusList <- split_by_doc_id(myCorpus)
} else {}

[Package tm.plugin.koRpus version 0.4-2 Index]