kanjidata {kanjistat} | R Documentation |
Data on kanji
Description
The tibbles kbase and kmorph provide basic and morphologic information, respectively, for all kanji contained in the KANJIDIC2 file (see below)
Usage
kbase
kmorph
Format
kbase is a tibble with 13,108 rows and 13 variables:
- kanji
the kanji
- unicode
the Unicode codepoint
- strokes
the number of strokes
- class
one of four classes: "kyouiku", "jouyou", "jinmeiyou" or "hyougai"
- grade
a number from 1-11, basically a finer version of class, same as in KANJIDIC2, except that we assgined an 11 for all hyougaiji (rather than an NA value)
- kanken
at what level the kanji appears in the Nihon Kanji Nouryoku Kentei (Kanken)
- jlpt
at what level the kanji appears in the Japanese Language Proficiency Test (Nihongou Nouryoku Shiken)
- wanikani
at what level the kanji is learned on the kanji learning website Wanikani
- frank
the frequency rank (1 = most frequent) "based on several averages (Wikipedia, novels, newspapers, ...)"
- frank_news
the frequency rank (1 = most frequent) based on news paper data (2501 most frequent kanji over four years in the Mainichi Shimbun)
- read_on, read_kun
a single ON reading in katakana
- read_kun
a single kun reading in hiragana
- mean
a single English meaning of the kanji
kmorph is a tibble with 13,108 rows and 15 variables:
- kanji
the kanji
- strokes
the number of strokes
- radical
the traditional (Kangxi) radical used for indexing kanji (one of 214)
- radvar
the variant of the radical if it is different, otherwise
NA
- nelson_c
the Nelson radical if it differs from the traditional one, otherwise
NA
- idc
ideographic description character (plus sometimes a number or a letter) describing the shape of the kanji
- components
visible components of the kanji; originally from KRADFILE
- skip
the kanji's SKIP code
- mean
a single English meaning of the kanji (same as in kbase)
Details
The single ON and kun readings and the single meaning are for easy identification
of the more difficult kanji. They are the first entry in the KANJIDIC2 file which may not
always be the most important one. For full readings/meanings use the function lookup
or consult a dictionary.
Source
Most of the data is directly from the KANJIDIC2 file.
https://www.edrdg.org/wiki/index.php/KANJIDIC_Project
Variables jlpt
, frank
, idc
, components
were taken from the Kanjium data base
https://github.com/mifunetoshiro/kanjium
Variable components
is originally from RADKFILE/KRADFILE.
https://www.edrdg.org/)
The use of this data is covered in each case by a Creative Commons BY-SA 4.0 License. See the package's LICENSE file for details and copyright holders.
Variable "class" is derived from "grade".
Variable "kanken" was compiled based on the Wikipedia description of the test levels (as of September 2022).