read.corp.custom,kRp.corpus-method {tm.plugin.koRpus}R Documentation

Apply read.corp.custom() to all texts in kRp.corpus objects

Description

This method calls read.corp.custom on all tagged text objects inside the given corpus object.

Usage

## S4 method for signature 'kRp.corpus'
read.corp.custom(corpus, caseSens = TRUE, log.base = 10,
      keep_dtm = FALSE, ...)

Arguments

corpus

An object of class kRp.corpus.

caseSens

Logical. If FALSE, all tokens will be matched in their lower case form.

log.base

A numeric value defining the base of the logarithm used for inverse document frequency (idf). See log for details.

keep_dtm

Logical. If TRUE and corpus does not yet provide a document term matrix, the one generated during calculation will be added to the resulting object.

...

Options to pass through to the read.corp.custom method for objects of the class union kRp.text.

Details

Since the analysis is based on a document term matrix, a pre-existing matrix as a feature of the corpus object will be used if it matches the case sensitivity setting. Otherwise a new matrix will be generated (but not replace the existing one). If no document term matrix is present yet, also one will be generated and can be kept as an additional feature of the resulting object.

Value

An object of the same class as corpus.

Examples

# use readCorpus() to create an object of class kRp.corpus
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  myCorpus <- readCorpus(
    dir=file.path(
      path.package("tm.plugin.koRpus"), "examples", "corpus", "Edwards"
    ),
    hierarchy=list(
      Source=c(
        Wikipedia_prev="Wikipedia (old)",
        Wikipedia_new="Wikipedia (new)"
      )
    ),
    # use tokenize() so examples run without a TreeTagger installation
    tagger="tokenize",
    lang="en"
  )

  myCorpus <- read.corp.custom(myCorpus)
  corpusCorpFreq(myCorpus)
} else {}

[Package tm.plugin.koRpus version 0.4-2 Index]