R: term_freq

term_freq {R.temis}

R Documentation

term_freq

Description

Study frequencies of chosen terms in the corpus, among documents, or among levels of

Usage

term_freq(dtm, terms, variable = NULL, by_term = FALSE)

Arguments

`dtm`	A `DocumentTermMatrix`.
`terms`	One or more reference term(s) appearing in `dtm`.
`variable`	An optional vector of values giving the groups for which most frequent terms should be reported.
`by_term`	Whether the third dimension of the array should be terms instead of levels.

Value

A list of matrices, one for each level of the variable, with columns:

"\ in documents where the chosen term is also present.
"\ where the chosen term is also present (rather than in documents where it does not appear), i.e. the percent of cooccurrences for the term..
"Global \ in the corpus (or in the subset of the corpus corresponding to the variable level).
"Level": the number of cooccurrences of the term.
"Global": the number of occurrences of the term in the corpus (or in the subset of the corpus corresponding to the variable level).
"t value": the quantile of a normal distribution corresponding the probability "Prob.".
"Prob.": the probability of observing such an extreme (high or low) number of occurrences of the term in documents where the chosen term is also present, under an hypergeometric distribution.

Examples


file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva")
corpus <- import_corpus(file, "factiva", language="en")
dtm <- build_dtm(corpus)
term_freq(dtm, "barrel")
term_freq(dtm, "barrel", meta(corpus)$Date)

[Package R.temis version 0.1.3 Index]