| HTMLencode {textutils} | R Documentation |
Decode and Encode HTML Entities
Description
Decode and encode HTML entities.
Usage
HTMLdecode(x, named = TRUE, hex = TRUE, decimal = TRUE)
HTMLencode(x, use.iconv = FALSE, encode.only = NULL)
HTMLrm(x, ...)
Arguments
x |
|
use.iconv |
logical. Should conversion via |
named |
logical: replace named character references? |
hex |
logical: replace hexadecimal character references? |
decimal |
logical: replace decimal character references? |
encode.only |
character |
... |
other arguments |
Details
HTMLdecode replaces named, hexadecimal and decimal
character references as defined by HTML5 (see
References) with characters. The resulting character vector
is marked as UTF-8 (see Encoding).
HTMLencode replaces UTF-8-encoded
substrings with HTML5 named entities (a.k.a.
“named character references”). A semicolon
‘;’ will not be replaced by the entity
‘;’. Other than that, however,
HTMLencode is quite thorough in its job: it will
replace all characters for which named entities exists, even
‘,’ and or ‘?’. You
can restrict the characters to be replaced by specifying
encode.only.
HTMLrm removes HTML tags. All content
between style and head tags is removed, as are
comments. Note that each element of x is considered
a single HTML document; so for multiline
documents, paste/collapse the document.
Value
character
Author(s)
Enrico Schumann
References
https://www.w3.org/TR/html5/syntax.html#named-character-references
https://html.spec.whatwg.org/multipage/syntax.html#character-references
See Also
Examples
HTMLdecode(c("Max & Moritz", "4 < 9"))
## [1] "Max & Moritz" "4 < 9"
HTMLencode(c("Max & Moritz", "4 < 9"))
## [1] "Max & Moritz" "4 < 9"
HTMLencode("Max, Moritz & more")
## [1] "Max, Moritz & more"
HTMLencode("Max, Moritz & more", encode.only = c("&", "<", ">"))
## [1] "Max, Moritz & more"
HTMLrm("before <a href='http://enricoschumann.net'>LINK</a> after")
## [1] "before http://enricoschumann.net after"