Japanese Text Processing Tools

Documentation for package ‘audubon’ version 0.5.2

bind_lr	Bind importance of bigrams
bind_tf_idf2	Bind term frequency and inverse document frequency
collapse_tokens	Collapse sequences of tokens by condition
get_dict_features	Get dictionary's features
hiroba	Whole tokens of 'Porano no Hiroba' written by Miyazawa Kenji from Aozora Bunko
lex_density	Calculate lexical density
mute_tokens	Mute tokens by condition
ngram_tokenizer	Ngrams tokenizer
pack	Pack a data.frame of tokens
polano	Whole text of 'Porano no Hiroba' written by Miyazawa Kenji from Aozora Bunko
prettify	Prettify tokenized output
read_rewrite_def	Read a rewrite.def file
strj_fill_iter_mark	Fill Japanese iteration marks
strj_hiraganize	Hiraganize Japanese characters
strj_katakanize	Katakanize Japanese characters
strj_normalize	Convert text following the rules of 'NEologd'
strj_rewrite_as_def	Rewrite text using rewrite.def
strj_romanize	Romanize Japanese Hiragana and Katakana
strj_segment	Segment text into tokens
strj_tinyseg	Segment text into phrases
strj_tokenize	Split text into tokens
strj_transcribe_num	Transcribe Arabic to Kansuji