utf8_normalize {utf8} | R Documentation |
Text Normalization
Description
Transform text to normalized form, optionally mapping to lowercase and applying compatibility maps.
Usage
utf8_normalize(x, map_case = FALSE, map_compat = FALSE,
map_quote = FALSE, remove_ignorable = FALSE)
Arguments
x |
character object. |
map_case |
a logical value indicating whether to apply Unicode case mapping to the text. For most languages, this transformation changes uppercase characters to their lowercase equivalents. |
map_compat |
a logical value indicating whether to apply Unicode compatibility mappings to the characters, those required for NFKC and NFKD normal forms. |
map_quote |
a logical value indicating whether to replace curly single quotes and Unicode apostrophe characters with ASCII apostrophe (U+0027). |
remove_ignorable |
a logical value indicating whether to remove Unicode "default ignorable" characters like zero-width spaces and soft hyphens. |
Details
utf8_normalize
converts the elements of a character object to
Unicode normalized composed form (NFC) while applying the character
maps specified by the map_case
, map_compat
,
map_quote
, and remove_ignorable
arguments.
Value
The result is a character object with the same attributes as x
but with Encoding
set to "UTF-8"
.
See Also
Examples
angstrom <- c("\u00c5", "\u0041\u030a", "\u212b")
utf8_normalize(angstrom) == "\u00c5"