convert_kanji {kanjistat} | R Documentation |
Convert between kanji formats
Description
Accept any interpretable representation of kanji in terms of index numbers,
UTF-8 character strings of length 1, UTF-8 codepoints
or kanjivec
objects and convert it to all or any of these
formats.
Usage
convert_kanji(
key,
output = c("all", "index", "character", "hexmode", "kanjivec"),
simplify = TRUE
)
Arguments
key |
an atomic vector or list of kanji in any combination of formats. |
output |
a string describing the desired output. |
simplify |
logical. Whether to simplify the output to an atomic vector or keep the structure of the original vector. In either case it depends on output whether this is possible. |
Details
Index numbers are in terms of the order in kbase
. UTF-8 codepoints are
usually of class "hexmode", but character strings starting
with "0x" or "0X" are also accepted in the key
.
For output = "kanjivec"
, the GitHub package kanjistat.data has to be available or
an error is returned. For output = "all"
, component kanjivec is set to NA if
kanjistat.data is not available.
Value
A vector of the same length as key. If simplify
is TRUE
, this is an
atomic vector for output = "index", "character" or "hexmode", and a list
for output = "kanjivec" or "all" a list. If simplify
is FALSE
, the original
structure (atomic or list) kept whenever possible.
Examples
convert_kanji(as.hexmode("99ac"))
convert_kanji("0x99ac") # same
convert_kanji(500, "character") == kbase$kanji[500] # TRUE