build_sys_dic {gibasa}R Documentation

Build system dictionary

Description

Builds a UTF-8 system dictionary from source dictionary files.

Usage

build_sys_dic(dic_dir, out_dir, encoding)

Arguments

dic_dir

Directory where the source dictionaries are located. This argument is passed as '-d' option argument.

out_dir

Directory where the binary dictionary will be written. This argument is passed as '-o' option argument.

encoding

Encoding of input csv files. This argument is passed as '-f' option argument.

Details

This function is a wrapper around dictionary compiler of 'MeCab'.

Running this function will create 4 files: 'char.bin', 'matrix.bin', 'sys.dic', and 'unk.dic' in out_dir.

To use these compiled dictionary, you also need create a dicrc file in out_dir. A dicrc file is included in source dictionaries, so you can just copy it to out_dir.

Value

A TRUE is invisibly returned if dictionary is successfully built.

Examples


if (requireNamespace("withr")) {
  # create a sample dictionary in temporary directory
  build_sys_dic(
    dic_dir = system.file("latin", package = "gibasa"),
    out_dir = tempdir(),
    encoding = "utf8"
  )
  # copy the 'dicrc' file
  file.copy(
    system.file("latin/dicrc", package = "gibasa"),
    tempdir()
  )
  # mocking a 'mecabrc' file to temporarily use the dictionary
  withr::with_envvar(
    c(
      "MECABRC" = if (.Platform$OS.type == "windows") {
        "nul"
      } else {
        "/dev/null"
      },
      "RCPP_PARALLEL_BACKEND" = "tinythread"
    ),
    {
      tokenize("katta-wokattauresikatta", sys_dic = tempdir())
    }
  )
}


[Package gibasa version 1.1.0 Index]