R: Morpheme tokenizer based on mecab-ko

token_morph {RmecabKo}

R Documentation

Morpheme tokenizer based on mecab-ko

Description

These tokernizer functions perform tokenization into full or selected morphemes, nouns.

Usage

token_morph(phrase, strip_punct = FALSE, strip_numeric = FALSE)

token_words(phrase, strip_punct = FALSE, strip_numeric = FALSE)

token_nouns(phrase, strip_punct = FALSE, strip_numeric = FALSE)

Arguments

`phrase`	A character vector or a list of character vectors to be tokenized into morphemes. If `phrase` is a charactor vector, it can be of any length, and each element will be tokenized separately. If `phrase` is a list of charactor vectors, each element of the list should be a one-item vector.
`strip_punct`	Bool. If you want to remove punctuations in the phrase, set this as TRUE.
`strip_numeric`	Bool. If you want to remove numbers in the phrase, set this as TRUE.

Value

A list of character vectors containing the tokens, with one element in the list.

See examples in Github.

Examples

## Not run: 
txt <- # Some Korean sentence

token_morph(txt)
token_words(txt, strip_punct = FALSE)
token_nouns(txt, strip_numeric = TRUE)

## End(Not run)

[Package RmecabKo version 0.1.6.2 Index]