boilerpipeR-package {boilerpipeR} | R Documentation |
Extract the main content from HTML files
Description
boilerpipeR interfaces the boilerpipe Java library, created by Christian Kohlschutter https://github.com/kohlschutter/boilerpipe. It implements robust heuristics to extract the main content from HTML files, removing unessecary elements like ads, banners and headers/footers.
Author(s)
Mario Annau mario.annau@gmail
See Also
Extractor
DefaultExtractor
ArticleExtractor
Examples
## Not run:
data(content)
extract <- DefaultExtractor(content)
cat(extract)
## End(Not run)
[Package boilerpipeR version 1.3.2 Index]