jiebaR {jiebaR} | R Documentation |
A package for Chinese text segmentation
Description
This is a package for Chinese text segmentation, keyword extraction and speech tagging with Rcpp and cppjieba.
Details
You can use custom dictionary. JiebaR can also identify new words, but adding new words will ensure higher accuracy.
Author(s)
Qin Wenfeng <http://qinwenfeng.com>
References
CppJieba https://github.com/aszxqw/cppjieba;
See Also
JiebaR https://github.com/qinwf/jiebaR;
Examples
### Note: Can not display Chinese characters here.
## Not run:
words = "hello world"
engine1 = worker()
segment(words, engine1)
# "./temp.txt" is a file path
segment("./temp.txt", engine1)
engine2 = worker("hmm")
segment("./temp.txt", engine2)
engine2$write = T
segment("./temp.txt", engine2)
engine3 = worker(type = "mix", dict = "dict_path",symbol = T)
segment("./temp.txt", engine3)
## End(Not run)
## Not run:
### Keyword Extraction
engine = worker("keywords", topn = 1)
keywords(words, engine)
### Speech Tagging
tagger = worker("tag")
tagging(words, tagger)
### Simhash
simhasher = worker("simhash", topn = 1)
simhash(words, simhasher)
distance("hello world" , "hello world!" , simhasher)
show_dictpath()
## End(Not run)
[Package jiebaR version 0.11 Index]