boilerpipeR-package |
Extract the main content from HTML files |
ArticleExtractor |
A full-text extractor which is tuned towards news articles. |
ArticleSentencesExtractor |
A full-text extractor which is tuned towards extracting sentences from news articles. |
boilerpipe |
Extract the main content from HTML files |
CanolaExtractor |
A full-text extractor trained on a 'krdwrd' Canola (see 'https://krdwrd.org/trac/attachment/wiki/Corpora/Canola/CANOLA.pdf'. |
content |
Wordpress generated Webpage (retrieved from Quantivity Blog <https://quantivity.wordpress.com>). Content is saved as character and ready to be extracted. |
DefaultExtractor |
A quite generic full-text extractor. |
Extractor |
Generic extraction function which calls boilerpipe extractors |
KeepEverythingExtractor |
Marks everything as content. |
LargestContentExtractor |
A full-text extractor which extracts the largest text component of a page. |
NumWordsRulesExtractor |
A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block). |