basic-tokenizers | Basic tokenizers |
chunk_text | Chunk text into smaller segments |
count_characters | Count words, sentences, characters |
count_sentences | Count words, sentences, characters |
count_words | Count words, sentences, characters |
mobydick | The text of Moby Dick |
ngram-tokenizers | N-gram tokenizers |
tokenizers | Tokenizers |
tokenize_characters | Basic tokenizers |
tokenize_character_shingles | Character shingle tokenizers |
tokenize_lines | Basic tokenizers |
tokenize_ngrams | N-gram tokenizers |
tokenize_paragraphs | Basic tokenizers |
tokenize_ptb | Penn Treebank Tokenizer |
tokenize_regex | Basic tokenizers |
tokenize_sentences | Basic tokenizers |
tokenize_skip_ngrams | N-gram tokenizers |
tokenize_words | Basic tokenizers |
tokenize_word_stems | Word stem tokenizer |