Interface to the Boilerpipe Java Library

Documentation for package ‘boilerpipeR’ version 1.3.2

boilerpipeR-package	Extract the main content from HTML files
ArticleExtractor	A full-text extractor which is tuned towards news articles.
ArticleSentencesExtractor	A full-text extractor which is tuned towards extracting sentences from news articles.
boilerpipe	Extract the main content from HTML files
CanolaExtractor	A full-text extractor trained on a 'krdwrd' Canola (see 'https://krdwrd.org/trac/attachment/wiki/Corpora/Canola/CANOLA.pdf'.
content	Wordpress generated Webpage (retrieved from Quantivity Blog <https://quantivity.wordpress.com>). Content is saved as character and ready to be extracted.
DefaultExtractor	A quite generic full-text extractor.
Extractor	Generic extraction function which calls boilerpipe extractors
KeepEverythingExtractor	Marks everything as content.
LargestContentExtractor	A full-text extractor which extracts the largest text component of a page.
NumWordsRulesExtractor	A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).