rapidrake {rapidraker} | R Documentation |
Rapid RAKE
Description
A relatively fast version of the Rapid Automatic Keyword Extraction (RAKE) algorithm. See Automatic keyword extraction from individual documents for details on how RAKE works.
Usage
rapidrake(
txt,
stop_words = slowraker::smart_words,
stop_pos = c("VB", "VBD", "VBG", "VBN", "VBP", "VBZ"),
word_min_char = 3,
stem = TRUE,
phrase_delims = "[-,.?():;\"!/]"
)
Arguments
txt |
A character vector, where each element of the vector contains the text for one document. |
stop_words |
A vector of stop words which will be removed from your
documents. The default value ( |
stop_pos |
All words that have a part-of-speech (POS) that appears in
|
word_min_char |
The minimum number of characters that a word must have
to remain in the corpus. Words with fewer than |
stem |
Do you want to stem the words before running RAKE? |
phrase_delims |
A regular expression containing the characters that will be used as phrase delimiters |
Value
An object of class rakelist
, which is just a list of data
frames (one data frame for each element of txt
). Each data frame
will have the following columns:
- keyword
A keyword that was identified by RAKE.
- freq
The number of times the keyword appears in the document.
- score
The keyword's score, as per the RAKE algorithm. Keywords with higher scores are considered to be higher quality than those with lower scores.
- stem
If you specified
stem = TRUE
, you will get the stemmed versions of the keywords in this column. When you choose stemming, the keyword's score (score
) will be based off its stem, but the reported number of times that the keyword appears (freq
) will still be based off of the raw, unstemmed version of the keyword.
Examples
## Not run:
rakelist <- rapidrake(txt = "some text that has great keywords")
slowraker::rbind_rakelist(rakelist)
## End(Not run)