gbs_tokenize {gibasa} | R Documentation |
Tokenize sentences using 'MeCab'
Description
Tokenize sentences using 'MeCab'
Usage
gbs_tokenize(
x,
sys_dic = "",
user_dic = "",
split = FALSE,
partial = FALSE,
mode = c("parse", "wakati")
)
Arguments
x |
A data.frame like object or a character vector to be tokenized. |
sys_dic |
Character scalar; path to the system dictionary for 'MeCab'. Note that the system dictionary is expected to be compiled with UTF-8, not Shift-JIS or other encodings. |
user_dic |
Character scalar; path to the user dictionary for 'MeCab'. |
split |
Logical. When passed as |
partial |
Logical. When passed as |
mode |
Character scalar to switch output format. |
Value
A tibble or a named list of tokens.