kanjidata {kanjistat}R Documentation

Data on kanji

Description

The tibbles kbase and kmorph provide basic and morphologic information, respectively, for all kanji contained in the KANJIDIC2 file (see below)

Usage

kbase

kmorph

Format

kbase is a tibble with 13,108 rows and 13 variables:

kanji

the kanji

unicode

the Unicode codepoint

strokes

the number of strokes

class

one of four classes: "kyouiku", "jouyou", "jinmeiyou" or "hyougai"

grade

a number from 1-11, basically a finer version of class, same as in KANJIDIC2, except that we assgined an 11 for all hyougaiji (rather than an NA value)

kanken

at what level the kanji appears in the Nihon Kanji Nouryoku Kentei (Kanken)

jlpt

at what level the kanji appears in the Japanese Language Proficiency Test (Nihongou Nouryoku Shiken)

wanikani

at what level the kanji is learned on the kanji learning website Wanikani

frank

the frequency rank (1 = most frequent) "based on several averages (Wikipedia, novels, newspapers, ...)"

frank_news

the frequency rank (1 = most frequent) based on news paper data (2501 most frequent kanji over four years in the Mainichi Shimbun)

read_on, read_kun

a single ON reading in katakana

read_kun

a single kun reading in hiragana

mean

a single English meaning of the kanji

kmorph is a tibble with 13,108 rows and 15 variables:

kanji

the kanji

strokes

the number of strokes

radical

the traditional (Kangxi) radical used for indexing kanji (one of 214)

radvar

the variant of the radical if it is different, otherwise NA

nelson_c

the Nelson radical if it differs from the traditional one, otherwise NA

idc

ideographic description character (plus sometimes a number or a letter) describing the shape of the kanji

components

visible components of the kanji; originally from KRADFILE

skip

the kanji's SKIP code

mean

a single English meaning of the kanji (same as in kbase)

Details

The single ON and kun readings and the single meaning are for easy identification of the more difficult kanji. They are the first entry in the KANJIDIC2 file which may not always be the most important one. For full readings/meanings use the function lookup or consult a dictionary.

Source

Most of the data is directly from the KANJIDIC2 file. https://www.edrdg.org/wiki/index.php/KANJIDIC_Project
Variables jlpt, frank, idc, components were taken from the Kanjium data base https://github.com/mifunetoshiro/kanjium
Variable components is originally from RADKFILE/KRADFILE. https://www.edrdg.org/)

The use of this data is covered in each case by a Creative Commons BY-SA 4.0 License. See the package's LICENSE file for details and copyright holders.

Variable "class" is derived from "grade".
Variable "kanken" was compiled based on the Wikipedia description of the test levels (as of September 2022).


[Package kanjistat version 0.14.1 Index]