casefuns {Unicode} | R Documentation |
Unicode Case Conversions
Description
Default Unicode algorithms for case conversion.
Usage
u_to_lower_case(x)
u_to_upper_case(x)
u_to_title_case(x)
u_case_fold(x)
Arguments
x |
R objects (see Details). |
Details
These functions are generic functions, with methods for the Unicode
character classes (u_char
, u_char_range
,
and u_char_seq
) which suitably apply the case mappings
to the Unicode characters given by x
, and a default method
which treats x
as a vector of “Unicode strings”, and
returns a vector of UTF-8 encoded character strings with the results
of the case conversion of the elements of x
.
Currently, only the unconditional case maps are available for conversion to lower, upper or title case: other variants may be added eventually.
Currently, conversion to title case is only available for
u_char
objects. Other methods will be added
eventually (once the Unicode text segmentation algorithm is
implemented for detecting word boundaries).
Currently, u_case_fold
only performs full case folding
using the Unicode case mappings with status “C” and “F”:
other variants will be added eventually.
Value
For the methods for the Unicode character classes, a
u_char_seq
vector of Unicode character sequences with
the conversions of the characters in x
.
For the default method, a UTF-8 encoded character string with the
results of the case conversions of the elements of x
.
Examples
## Latin upper case letters A to Z:
x <- as.u_char(as.u_char_range("0041..005A"))
## In case we did not know the code points, we could use e.g.
x <- as.u_char(utf8ToInt(paste(LETTERS, collapse = "")))
vapply(x, intToUtf8, "")
## Unicode character method:
vapply(u_to_lower_case(x), intToUtf8, "")
## Default method:
u_to_lower_case(LETTERS)
u_case_fold("Hi Dave.")
## More interesting stuff: sharp s.
u_to_upper_case("heiß")
## Note that the default full upper case mapping of U+00DF (LATIN SMALL
## LETTER SHARP S) is *not* to U+1E9E (LATIN CAPITAL LETTER SHARP S).
u_case_fold("heiß")