CSeqpat {CSeqpat} | R Documentation |
Mining Frequent Contiguous Sequential Patterns in a Text Corpus
Description
Takes in the filepath and minimum support and performs pattern mining
Usage
CSeqpat(filepath, phraselenmin = 1, phraselenmax = 99999, minsupport = 1,
docdelim, stopword = FALSE, stemword = FALSE, lower = FALSE,
removepunc = FALSE)
Arguments
filepath |
Path to the text file/text corpus |
phraselenmin |
Minimum number of words in a phrase |
phraselenmax |
Maximum number of words in a phrase |
minsupport |
Minimum absolute support for mining the patterns |
docdelim |
Document delimiter in the corpus |
stopword |
Remove stopwords from the document corpus (boolean) |
stemword |
Perform stemming on the document corpus (boolean) |
lower |
Lower case all words in document corpus (boolean) |
removepunc |
Remove punctuations from document corpus (boolean) |
Value
A dataframe containing the frequent phrase patterns with their absolute support
Examples
test1 <- c("hoagie institution food year road ",
"place little dated opened weekend fresh food")
tf <- tempfile()
writeLines(test1, tf)
CSeqpat(tf,1,2,2,"\t",TRUE,FALSE,TRUE,FALSE)
[Package CSeqpat version 0.1.2 Index]