tokenizers {Unicode} | R Documentation |
Unicode Alphabetic Tokenizer
Description
A simple Unicode alphabetic tokenizer.
Usage
Unicode_alphabetic_tokenizer(x)
Arguments
x |
a character vector. |
Details
Tokenization first replaces the elements of x
by their Unicode
character sequences. Then, the non-alphabetic characters (i.e., the
ones which do not have the Alphabetic property) are replaced by
blanks, and the corresponding strings are split according to the
blanks.
Value
A character vector with the tokenized strings.
[Package Unicode version 15.1.0-1 Index]