Fast, Consistent Tokenization of Natural Language Text


[Up] [Top]

Documentation for package ‘tokenizers’ version 0.3.0

Help Pages

basic-tokenizers Basic tokenizers
chunk_text Chunk text into smaller segments
count_characters Count words, sentences, characters
count_sentences Count words, sentences, characters
count_words Count words, sentences, characters
mobydick The text of Moby Dick
ngram-tokenizers N-gram tokenizers
tokenizers Tokenizers
tokenize_characters Basic tokenizers
tokenize_character_shingles Character shingle tokenizers
tokenize_lines Basic tokenizers
tokenize_ngrams N-gram tokenizers
tokenize_paragraphs Basic tokenizers
tokenize_ptb Penn Treebank Tokenizer
tokenize_regex Basic tokenizers
tokenize_sentences Basic tokenizers
tokenize_skip_ngrams N-gram tokenizers
tokenize_words Basic tokenizers
tokenize_word_stems Word stem tokenizer