R: Unicode Character Names

u_char_names {Unicode}

R Documentation

Unicode Character Names

Description

Find the names or labels of Unicode characters, or Unicode characters by their name.

Usage

u_char_name(x)
u_char_from_name(x, type = c("exact", "grep"), ...)
u_char_label(x)

Arguments

`x`	an R object which can be coerced to a `u_char` vector of Unicode characters via `as.u_char` for `u_char_name` and `u_char_label`; a character vector otherwise.
`type`	one of `"exact"` or `"grep"`, or an abbreviation thereof.
`...`	arguments to be passed to `grepl` when using this for pattern matching.

Details

The Unicode Standard provides a convention for labeling code points that do not have character names (control, reserved, noncharacter, private-use and surrogate code points). These labels can be obtained by u_char_label.

By default, exact matching is used for finding Unicode characters by name. When type = "grep", grepl is used for matching x against the Unicode character names; for now, Hangul syllable and CJK Unified Ideograph names are ignored in this case.

Value

For u_char_name and u_char_label, a character vector with the names or labels, respectively, of the corresponding Unicode characters.

For u_char_from_name, a u_char object giving the Unicode characters with name exactly matching the given names.

Examples

x <- as.u_char(utf8ToInt("Austria"))
u_char_name(x)

## Derived Hangul syllable character names are also supported for
## finding characters by exact matching:
x <- u_char_name("0xAC00")
x
u_char_from_name(x)

## Find all Unicode characters with name matching 'DIGIT ONE'.
x <- u_char_from_name("\\bDIGIT ONE\\b", "g")
## And show their names.
u_char_name(x)

[Package Unicode version 15.1.0-1 Index]