compute_term_frequency {cranly} | R Documentation |
Compute term frequencies from a vector of text
Description
Compute term frequencies from a vector of text
Usage
compute_term_frequency(
txt,
ignore_words = c("www.jstor.org", "www.arxiv.org", "arxiv.org", "provides", "https"),
stem = FALSE,
remove_punctuation = TRUE,
remove_stopwords = TRUE,
remove_numbers = TRUE,
to_lower = TRUE,
frequency = "term"
)
Arguments
txt |
a vector of character strings. |
ignore_words |
a vector of words to be ignored when forming the corpus. |
stem |
should words be stemmed using Porter's stemming algorithm? Default is |
remove_punctuation |
should punctuation be removed when forming the corpus? Default is |
remove_stopwords |
should english stopwords be removed when forming the corpus? Default is |
remove_numbers |
should numbers be removed when forming the corpus? Default is |
to_lower |
should all terms be coerced to lower-case when forming the corpus? Default is |
frequency |
the type of term frequencies to return. Options are The operations are taking place as follows: remove special
characters, covert to lower-case (depending on the values of
|
Details
If txt
is a named vector then the names are used as document id's
when forming the corpus.
Value
Either a named numeric vector (frequency = "term"
), or an object of class tm::DocumentTermMatrix (frequency = "document-term"
), or or an object of class tm::TermDocumentMatrix
(frequency = "term-document"
).