RmecabKo {RmecabKo} | R Documentation |
Rcpp Wrapper for Eunjeon Project
Description
The mecab-ko
and mecab-ko-dic
is based on a C++
library,
and POS tagging with them is useful when the spacing of source text is not correct.
For integrating mecab-ko
with R
, Rcpp
package is used for providing the basic framework.
Details
It is based on the Eunjeon Project
.
For Mac OSX and Linux, You need to install mecab-ko
and mecab-ko-dic
before install this package in R.
mecab-ko
: https://bitbucket.org/eunjeon/mecab-ko
mecab-ko-dic
: https://bitbucket.org/eunjeon/mecab-ko-dic
In Windows, install_mecab(mecabLocation)
function will install mecab-ko-msvc
and mecab-ko-dic-msvc
in user specified directory.
It is operated by system command and file I/O, the speed of the analysis is slow compared to the Linux-based operating system.
Author(s)
Junhewk Kim
References
Wonsup Yoon, mecab-ko VC++ builds at https://github.com/Pusnow/mecab-ko-msvc, https://github.com/Pusnow/mecab-ko-dic-msvc
Examples
## Not run:
# install.packages("devtools")
devtools::install_github("junhewk/RmecabKo")
# On Windows platform only
install_mecab("D:/Rlibs/mecab")
phrase <- # Some Korean character vectors
# For full POS tagging
pos(phrase)
# For noun extraction only
nouns(phrase)
# For tokenizing of selective morphemes
tokens_words(phrase)
# For n-grams tokenizing
tokens_ngram(phrase)
## End(Not run)