phm-package {phm} | R Documentation |
Phrase Mining
Description
Obtain all principal phrases from a corpus or a collection of texts and their frequencies in each of those texts. A principal phrase is a phrase that does not cross punctuation marks, does not start with a stop word, with the exception of the stop words "not" and "no", does not end with a stop word, is frequent within those texts without being double counted, and is meaningful to the user.
Details
The function PhraseDoc will extract all principal phrases from a corpus with documents or a character vector with texts and creates an object of class phraseDoc. The method as.matrix on a phraseDoc object converts the phraseDoc to a term-frequency matrix. The function freqPhrases displays the most frequent principal phrases in a phraseDoc object. The function getDocs will create a frequency matrix with all documents/texts that contain certain phrases, while getPhrases will create a frequency matrix with all phrases present in a specific collection of documents/texts.
Author(s)
Ellie Small
Maintainer: Ellie Small <esmall1@drew.edu>