Interface to the Boilerpipe Java Library


[Up] [Top]

Documentation for package ‘boilerpipeR’ version 1.3.2

Help Pages

boilerpipeR-package Extract the main content from HTML files
ArticleExtractor A full-text extractor which is tuned towards news articles.
ArticleSentencesExtractor A full-text extractor which is tuned towards extracting sentences from news articles.
boilerpipe Extract the main content from HTML files
CanolaExtractor A full-text extractor trained on a 'krdwrd' Canola (see 'https://krdwrd.org/trac/attachment/wiki/Corpora/Canola/CANOLA.pdf'.
content Wordpress generated Webpage (retrieved from Quantivity Blog <https://quantivity.wordpress.com>). Content is saved as character and ready to be extracted.
DefaultExtractor A quite generic full-text extractor.
Extractor Generic extraction function which calls boilerpipe extractors
KeepEverythingExtractor Marks everything as content.
LargestContentExtractor A full-text extractor which extracts the largest text component of a page.
NumWordsRulesExtractor A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).