bestDocs {phm} | R Documentation |
Find Informative Documents in a Corpus
Description
Find the documents in a corpus that have the most high frequency phrases and return a corpus with just those documents
Usage
bestDocs(co, num = 3L, n = 10L, pd = NULL)
Arguments
co |
A corpus with documents |
num |
Integer with the number of documents to return |
n |
Integer with the number of high frequency phrases to use |
pd |
phraseDoc object for the corpus in |
Value
A corpus with the num
documents that have the most
high frequency phrases, in order of the number of high frequency
phrases. The corpus returned will have the meta field oldIdx set
to the index of the document in the original corpus, and the meta
field hfPhrases to the number of high frequency phrases it contains.
Examples
v1=c("Here is some text to test phrase mining","phrase mining is fun",
"Some text is better than no text","No text, no phrase mining")
co=tm::VCorpus(tm::VectorSource(v1))
pd=phraseDoc(co,min.freq=2)
bestDocs(co,2,2,pd)
[Package phm version 1.1.2 Index]