phraseDoc {phm} | R Documentation |
phraseDoc Creation
Description
Create an object of class phraseDoc. This will hold all principal phrases of a collection of texts that occur a minimum number of times, plus the texts they occur in and their position within those texts.
Usage
phraseDoc(
co,
mn = 2,
mx = 8,
ssw = stopStartWords(),
sew = stopEndWords(),
sp = stopPhrases(),
min.freq = 2,
principal = function(phrase, freq) {
freq >= min.freq
},
max.phrases = 1500,
shiny = FALSE,
silent = FALSE
)
Arguments
co |
A corpus or a character vector with each element the text of a document. |
mn |
Minimum number of words in a phrase. |
mx |
Maximum number of words in a phrase. |
ssw |
A set of words no phrase should start with. |
sew |
A set of words no phrase should end with. |
sp |
A set of phrases to be excluded. |
min.freq |
The minimum frequency of phrases to be included. |
principal |
Function that determines if a phrase is a principal phrase.
By default, FALSE is returned if the phrase occurs less often than the number
in |
max.phrases |
Maximum number of phrases to be included. |
shiny |
TRUE if called from a shiny program. This will allow progress to be recorded on a progress meter; the function uses about 100 progress steps, so it should be created inside a withProgress function with the argument max set to at least 100. |
silent |
TRUE if you do not want progress messages. |
Value
Object of class phraseDoc
Examples
tst=c("This is a test text",
"This is a test text 2",
"This is another test text",
"This is another test text 2",
"This girl will test text that man",
"This boy will test text that man")
phraseDoc(tst)