token_morph {RmecabKo} | R Documentation |
Morpheme tokenizer based on mecab-ko
Description
These tokernizer functions perform tokenization into full or selected morphemes, nouns.
Usage
token_morph(phrase, strip_punct = FALSE, strip_numeric = FALSE)
token_words(phrase, strip_punct = FALSE, strip_numeric = FALSE)
token_nouns(phrase, strip_punct = FALSE, strip_numeric = FALSE)
Arguments
phrase |
A character vector or a list of character vectors to be tokenized into morphemes.
If |
strip_punct |
Bool. If you want to remove punctuations in the phrase, set this as TRUE. |
strip_numeric |
Bool. If you want to remove numbers in the phrase, set this as TRUE. |
Value
A list of character vectors containing the tokens, with one element in the list.
See examples in Github.
Examples
## Not run:
txt <- # Some Korean sentence
token_morph(txt)
token_words(txt, strip_punct = FALSE)
token_nouns(txt, strip_numeric = TRUE)
## End(Not run)
[Package RmecabKo version 0.1.6.2 Index]