Stopword_Maker {LilRhino} | R Documentation |
For the finding of the $N$ most populous words in a corpus.
Description
This function finds the $N$ most used words in a corpus. This is done to identify stop words to better prune data sets before training.
Usage
Stopword_Maker(titles, cutoff = 20)
Arguments
titles |
The documents in which the most populous words are sought. |
cutoff |
The number of $N$ top most used words to keep as stop words. |
Value
output |
A vector of the $N$ most populous words. |
Author(s)
Travis Barton
Examples
test_set = c('this is a testset', 'I am searching for a list of words',
'I like turtles',
'A rocket would be a fast way of getting to work, but I do not think it is very practical')
res = Stopword_Maker(test_set, 4)
print(res)
[Package LilRhino version 1.2.2 Index]