Weka_tokenizers {RWeka} | R Documentation |
R/Weka Tokenizers
Description
R interfaces to Weka tokenizers.
Usage
AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)
Arguments
x |
a character vector with strings to be tokenized. |
control |
an object of class |
Details
AlphabeticTokenizer
is an alphabetic string tokenizer, where
tokens are to be formed only from contiguous alphabetic sequences.
NGramTokenizer
splits strings into n
-grams with given
minimal and maximal numbers of grams.
WordTokenizer
is a simple word tokenizer.
Value
A character vector with the tokenized strings.
[Package RWeka version 0.4-46 Index]