R: Find Informative Documents in a Corpus

bestDocs {phm}

R Documentation

Find Informative Documents in a Corpus

Description

Find the documents in a corpus that have the most high frequency phrases and return a corpus with just those documents

Usage

bestDocs(co, num = 3L, n = 10L, pd = NULL)

Arguments

`co`	A corpus with documents
`num`	Integer with the number of documents to return
`n`	Integer with the number of high frequency phrases to use
`pd`	phraseDoc object for the corpus in `co`; if NULL, a phraseDoc will be created for it.

Value

A corpus with the num documents that have the most high frequency phrases, in order of the number of high frequency phrases. The corpus returned will have the meta field oldIdx set to the index of the document in the original corpus, and the meta field hfPhrases to the number of high frequency phrases it contains.

Examples

v1=c("Here is some text to test phrase mining","phrase mining is fun",
  "Some text is better than no text","No text, no phrase mining")
co=tm::VCorpus(tm::VectorSource(v1))
pd=phraseDoc(co,min.freq=2)
bestDocs(co,2,2,pd)

[Package phm version 1.1.2 Index]