htm2txt {htm2txt} | R Documentation |
Convert a html document to plain texts by stripping off all html tags
Description
Convert a html document to plain texts by stripping off all html tags
Usage
htm2txt(htm, list = "\n• ", pagebreak = "\n\n----------\n\n")
Arguments
htm |
A character vector, containing a html document, to be converted into plain texts (other objects are coerced into character vectors). |
list |
A character that replaces "li" tags (referring to a numbering or bullet for lists). The default is a line change followed by a bullet character and a space. |
pagebreak |
A character that replaces "hr" tags (referring to a thematic change in the content or a page break). |
Value
A character vector containing plain texts converted from the html document.
Examples
text = htm2txt("<html><body>html texts</body></html>")
text = htm2txt(c("Hello<p>World", "Goodbye<br>Friends"))
text = htm2txt("<p>Menu:</p><ul></li>Coffee</li><li>Tea</li></ul>", list = "\n- ")
text = htm2txt("Page 1<hr>Page 2", pagebreak = "\n\n[NEW PAGE]\n\n")
[Package htm2txt version 2.2.2 Index]