read.corp.custom {koRpus} | R Documentation |
Import custom corpus data
Description
Read data from a custom corpus into a valid object of class kRp.corp.freq
.
Usage
read.corp.custom(corpus, caseSens = TRUE, log.base = 10, ...)
## S4 method for signature 'kRp.text'
read.corp.custom(
corpus,
caseSens = TRUE,
log.base = 10,
dtm = docTermMatrix(obj = corpus, case.sens = caseSens),
as.feature = FALSE
)
Arguments
corpus |
An object of class |
caseSens |
Logical. If |
log.base |
A numeric value defining the base of the logarithm used for inverse document frequency (idf). See
|
... |
Additional options for methods of the generic. |
dtm |
A document term matrix of the |
as.feature |
Logical,
whether the output should be just the analysis results or the input object with
the results added as a feature. Use |
Details
The methods should enable you to perform a basic text corpus frequency analysis. That is,
not just to
import analysis results like LCC files,
but to import the corpus material itself. The resulting object
is of class kRp.corp.freq
,
so it can be used for frequency analysis by
other functions and methods of this package.
Value
An object of class kRp.corp.freq
.
Depending on as.feature
,
either an object of class kRp.corp.freq
,
or an object of class kRp.text
with the added feature corp_freq
containing it.
See Also
Examples
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
# call read.corp.custom() on a tokenized text
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
# if you call read.corp.custom() without arguments,
# you will get its results directly
en_corp <- read.corp.custom(
tokenized.obj,
caseSens=FALSE
)
# alternatively, you can also store those results as a
# feature in the object itself
tokenized.obj <- read.corp.custom(
tokenized.obj,
caseSens=FALSE,
as.feature=TRUE
)
# results are now part of the object
hasFeature(tokenized.obj)
corpusCorpFreq(tokenized.obj)
} else {}