bind_lr |
Bind importance of bigrams |
bind_tf_idf2 |
Bind term frequency and inverse document frequency |
collapse_tokens |
Collapse sequences of tokens by condition |
get_dict_features |
Get dictionary's features |
hiroba |
Whole tokens of 'Porano no Hiroba' written by Miyazawa Kenji from Aozora Bunko |
lex_density |
Calculate lexical density |
mute_tokens |
Mute tokens by condition |
ngram_tokenizer |
Ngrams tokenizer |
pack |
Pack a data.frame of tokens |
polano |
Whole text of 'Porano no Hiroba' written by Miyazawa Kenji from Aozora Bunko |
prettify |
Prettify tokenized output |
read_rewrite_def |
Read a rewrite.def file |
strj_fill_iter_mark |
Fill Japanese iteration marks |
strj_hiraganize |
Hiraganize Japanese characters |
strj_katakanize |
Katakanize Japanese characters |
strj_normalize |
Convert text following the rules of 'NEologd' |
strj_rewrite_as_def |
Rewrite text using rewrite.def |
strj_romanize |
Romanize Japanese Hiragana and Katakana |
strj_segment |
Segment text into tokens |
strj_tinyseg |
Segment text into phrases |
strj_tokenize |
Split text into tokens |
strj_transcribe_num |
Transcribe Arabic to Kansuji |