mine_text {inlpubs} | R Documentation |
Mine Text
Description
Performs a term frequency text analysis. A term is defined as a word or group of words.
Usage
mine_text(docs, ngmin = 1, ngmax = ngmin, sparse = NULL)
Arguments
docs |
'list' or 'character' vector. Document text to analyze. Each list item contains the extracted text from a single document. |
ngmin , ngmax |
integer number. Splits strings into n-grams with given minimal and maximal numbers of grams. An n-gram is an ordered sequence of n words taken from the body of a text. Requires the RWeka package is available and that the environment variable JAVA_HOME points to where the Java software is located. Recommended for single text compoents only. |
sparse |
'numeric' number that is greater than 0 and less than 1.
A threshold of relative document frequency for a term.
It specifies the proportion of documents in which a term must appear to be retained.
For example if you specify |
Details
HTML entities are decoded when the textutils package is available.
Value
A term-frequency data table giving the number of times each word occurs in the text.
A column in the table represents a single component in the docs
argument,
and each row provides frequency counts for a particular word (also known as a 'term').
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
See Also
search_terms
function to search for terms within the resulting term-frequency data table.
make_wordcloud
function to create a word cloud.
Examples
d <- c(
"The quick brown fox jumps over the lazy lazy dog.",
"Pack my brown box.",
"Jazz fly brown dog."
) |>
mine_text()
d <- list(
"A" = "The quick brown fox jumps over the lazy lazy dog.",
"B" = c("Pack my brown box.", NA, "Jazz fly brown dog."),
"C" = NA_character_
) |>
mine_text()