token_morph {RmecabKo}R Documentation

Morpheme tokenizer based on mecab-ko

Description

These tokernizer functions perform tokenization into full or selected morphemes, nouns.

Usage

token_morph(phrase, strip_punct = FALSE, strip_numeric = FALSE)

token_words(phrase, strip_punct = FALSE, strip_numeric = FALSE)

token_nouns(phrase, strip_punct = FALSE, strip_numeric = FALSE)

Arguments

phrase

A character vector or a list of character vectors to be tokenized into morphemes. If phrase is a charactor vector, it can be of any length, and each element will be tokenized separately. If phrase is a list of charactor vectors, each element of the list should be a one-item vector.

strip_punct

Bool. If you want to remove punctuations in the phrase, set this as TRUE.

strip_numeric

Bool. If you want to remove numbers in the phrase, set this as TRUE.

Value

A list of character vectors containing the tokens, with one element in the list.

See examples in Github.

Examples

## Not run: 
txt <- # Some Korean sentence

token_morph(txt)
token_words(txt, strip_punct = FALSE)
token_nouns(txt, strip_numeric = TRUE)

## End(Not run)


[Package RmecabKo version 0.1.6.2 Index]