build_sys_dic {gibasa} | R Documentation |
Build system dictionary
Description
Builds a UTF-8 system dictionary from source dictionary files.
Usage
build_sys_dic(dic_dir, out_dir, encoding)
Arguments
dic_dir |
Directory where the source dictionaries are located. This argument is passed as '-d' option argument. |
out_dir |
Directory where the binary dictionary will be written. This argument is passed as '-o' option argument. |
encoding |
Encoding of input csv files. This argument is passed as '-f' option argument. |
Details
This function is a wrapper around dictionary compiler of 'MeCab'.
Running this function will create 4 files:
'char.bin', 'matrix.bin', 'sys.dic', and 'unk.dic' in out_dir
.
To use these compiled dictionary,
you also need create a dicrc
file in out_dir
.
A dicrc
file is included in source dictionaries,
so you can just copy it to out_dir
.
Value
A TRUE
is invisibly returned if dictionary is successfully built.
Examples
if (requireNamespace("withr")) {
# create a sample dictionary in temporary directory
build_sys_dic(
dic_dir = system.file("latin", package = "gibasa"),
out_dir = tempdir(),
encoding = "utf8"
)
# copy the 'dicrc' file
file.copy(
system.file("latin/dicrc", package = "gibasa"),
tempdir()
)
# mocking a 'mecabrc' file to temporarily use the dictionary
withr::with_envvar(
c(
"MECABRC" = if (.Platform$OS.type == "windows") {
"nul"
} else {
"/dev/null"
},
"RCPP_PARALLEL_BACKEND" = "tinythread"
),
{
tokenize("katta-wokattauresikatta", sys_dic = tempdir())
}
)
}
[Package gibasa version 1.1.1 Index]