encoding {tau} | R Documentation |
Adapt the (Declared) Encoding of a Character Vector
Description
Functions for testing and adapting the (declared) encoding
of the components of a vector of mode character
.
Usage
is.utf8(x)
is.ascii(x)
is.locale(x)
translate(x, recursive = FALSE, internal = FALSE)
fixEncoding(x, latin1 = FALSE)
Arguments
x |
a vector (of character). |
recursive |
option to process list components. |
internal |
option to use internal translation. |
latin1 |
option to assume |
Details
is.utf8
tests if the components of a vector of character
are true UTF-8 strings, i.e. contain one or more valid UTF-8
multi-byte sequence(s).
is.locale
tests if the components of a vector of character
are in the encoding of the current locale.
translate
encodes the components of a vector of character
in the encoding of the current locale. This includes the names
attribute of vectors of arbitrary mode. If recursive = TRUE
the components of a list
are processed. If internal = TRUE
multi-byte sequences that are invalid in the encoding of the current
locale are changed to literal hex numbers (see FIXME).
fixEncoding
sets the declared encoding of the components of
a vector of character to their correct or preferred values. If
latin1 = TRUE
strings that are not valid UTF-8 strings are
declared to be in "latin1"
. On the other hand, strings that
are true UTF-8 strings are declared to be in "UTF-8"
encoding.
Value
The same type of object as x
with the (declared) encoding
possibly changed.
Note
Currently translate
uses iconv
and therefore is not
guaranteed to work on all platforms.
Author(s)
Christian Buchta
References
FIXME PCRE, RFC 3629
See Also
Examples
## Note that we assume R runs in an UTF-8 locale
text <- c("aa", "a\xe4")
Encoding(text) <- c("unknown", "latin1")
is.utf8(text)
is.ascii(text)
is.locale(text)
## implicit translation
text
##
t1 <- iconv(text, from = "latin1", to = "UTF-8")
Encoding(t1)
## oops
t2 <- iconv(text, from = "latin1", to = "utf-8")
Encoding(t2)
t2
is.locale(t2)
##
t2 <- fixEncoding(t2)
Encoding(t2)
## explicit translation
t3 <- translate(text)
Encoding(t3)