Japanese Text Processing Tools

[Up] [Top]

Documentation for package ‘audubon’ version 0.5.1

Help Pages

bind_lr Bind importance of bigrams
bind_tf_idf2 Bind term frequency and inverse document frequency
collapse_tokens Collapse sequences of tokens by condition
get_dict_features Get dictionary's features
hiroba Whole tokens of 'Porano no Hiroba' written by Miyazawa Kenji from Aozora Bunko
lex_density Calculate lexical density
mute_tokens Mute tokens by condition
ngram_tokenizer Ngrams tokenizer
pack Pack a data.frame of tokens
polano Whole text of 'Porano no Hiroba' written by Miyazawa Kenji from Aozora Bunko
prettify Prettify tokenized output
read_rewrite_def Read a rewrite.def file
strj_fill_iter_mark Fill Japanese iteration marks
strj_hiraganize Hiraganize Japanese characters
strj_katakanize Katakanize Japanese characters
strj_normalize Convert text following the rules of 'NEologd'
strj_rewrite_as_def Rewrite text using rewrite.def
strj_romanize Romanize Japanese Hiragana and Katakana
strj_segment Segment text into tokens
strj_tinyseg Segment text into phrases
strj_tokenize Split text into tokens
strj_transcribe_num Transcribe Arabic to Kansuji