R: Data on kanji

kanjidata {kanjistat}

R Documentation

Data on kanji

Description

The tibbles kbase and kmorph provide basic and morphologic information, respectively, for all kanji contained in the KANJIDIC2 file (see below)

Usage

kbase

kmorph

Format

kbase is a tibble with 13,108 rows and 13 variables:

kanji: the kanji
unicode: the Unicode codepoint
strokes: the number of strokes
class: one of four classes: "kyouiku", "jouyou", "jinmeiyou" or "hyougai"
grade: a number from 1-11, basically a finer version of class, same as in KANJIDIC2, except that we assgined an 11 for all hyougaiji (rather than an NA value)
kanken: at what level the kanji appears in the Nihon Kanji Nouryoku Kentei (Kanken)
jlpt: at what level the kanji appears in the Japanese Language Proficiency Test (Nihongou Nouryoku Shiken)
wanikani: at what level the kanji is learned on the kanji learning website Wanikani
frank: the frequency rank (1 = most frequent) "based on several averages (Wikipedia, novels, newspapers, ...)"
frank_news: the frequency rank (1 = most frequent) based on news paper data (2501 most frequent kanji over four years in the Mainichi Shimbun)
read_on, read_kun: a single ON reading in katakana
read_kun: a single kun reading in hiragana
mean: a single English meaning of the kanji

kmorph is a tibble with 13,108 rows and 15 variables:

kanji: the kanji
strokes: the number of strokes
radical: the traditional (Kangxi) radical used for indexing kanji (one of 214)
radvar: the variant of the radical if it is different, otherwise NA
nelson_c: the Nelson radical if it differs from the traditional one, otherwise NA
idc: ideographic description character (plus sometimes a number or a letter) describing the shape of the kanji
components: visible components of the kanji; originally from KRADFILE
skip: the kanji's SKIP code
mean: a single English meaning of the kanji (same as in kbase)

Details

The single ON and kun readings and the single meaning are for easy identification of the more difficult kanji. They are the first entry in the KANJIDIC2 file which may not always be the most important one. For full readings/meanings use the function lookup or consult a dictionary.

Source

Most of the data is directly from the KANJIDIC2 file. https://www.edrdg.org/wiki/index.php/KANJIDIC_Project
Variables jlpt, frank, idc, components were taken from the Kanjium data base https://github.com/mifunetoshiro/kanjium
Variable components is originally from RADKFILE/KRADFILE. https://www.edrdg.org/)

The use of this data is covered in each case by a Creative Commons BY-SA 4.0 License. See the package's LICENSE file for details and copyright holders.

Variable "class" is derived from "grade".
Variable "kanken" was compiled based on the Wikipedia description of the test levels (as of September 2022).

[Package kanjistat version 0.14.1 Index]