R: frequent

frequent_terms {R.temis}

R Documentation

frequent_terms

Description

List terms with the highest number of occurrences in the document-term matrix of a corpus, possibly grouped by the levels of a variable.

Usage

frequent_terms(dtm, variable = NULL, n = 25)

Arguments

`dtm`	A `DocumentTermMatrix`.
`variable`	An optional vector of values giving the groups for which most frequent terms should be reported.
`n`	The maximal number of terms to report (for each group, if applicable).

Value

A list of matrices, one for each level of the variable, with columns:

"\
"\ (rather than in other levels).
"Global \
"Level": the number of occurrences of the term in the level ("internal").
"Global": the number of occurrences of the term in the corpus.
"t value": the quantile of a normal distribution corresponding the probability "Prob.".
"Prob.": the probability of observing such an extreme (high or low) number of occurrences of the term in the level, under an hypergeometric distribution.

Examples


file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva")
corpus <- import_corpus(file, "factiva", language="en")
dtm <- build_dtm(corpus)
frequent_terms(dtm)
frequent_terms(dtm, meta(corpus)$Date)

[Package R.temis version 0.1.3 Index]