read_kanjidic2 {kanjistat}R Documentation

Read a KANJIDIC2 file

Description

Perform basic validity checks and transform data to a standardized list or keep as an object of class xml_document (package xml2).

Usage

read_kanjidic2(fpath = NULL, output = c("list", "xml"))

Arguments

fpath

the path to a local KANJIDIC2 file. If NULL (the default) the most recent KANJIDIC2 file is downloaded from https://www.edrdg.org/kanjidic/kanjidic2.xml.gz after asking for confirmation.

output

one of "list" or "xml". The desired type of output.

Details

KANJIDIC2 contains detailed information on all of the 13108 kanji in three main Japanese standards (JIS X 0208, 0212 and 0213). The KANJIDIC files have been compiled and maintained by Jim Breen since 1991, with the help of various other people. The copyright is now held by the Electronic Dictionary Research and Development Group (EDRDG). The files are made available under the Creative Commons BY-SA 4.0 license. See https://www.edrdg.org/wiki/index.php/KANJIDIC_Project for details on the contents of the files and their license.

If output = "xml", some minimal checks are performed (high level structure and total number of kanji).

If output = "list", additional validity checks of the lower level structure are performed. Most are in accordance with the file's Document Type Definition (DTD). Some additional check concern some common patterns that are true about the current KANJIDIC2 file (as of December 2023) and seem unlikely to change in the near future. This includes that there is always at most one rmgroup entry in reading_meaning. Informative warnings are provided if any of these additional checks fail.

Value

If output = "xml", the exact XML document obtained from xml2::read_xml. If output = "list", a list of lists (the individual kanji), each with the following seven components.

See Also

kanjidata, kreadmean

Examples

if (interactive()) {
  read_kanjidic2("kanjidic2.xml")
}


[Package kanjistat version 0.14.1 Index]