R: Break Text at Spaces

tokenize_space {piecemaker}

R Documentation

Break Text at Spaces

Description

This is an extremely simple tokenizer, breaking only and exactly on the space character. This tokenizer is intended to work in tandem with prepare_text, so that spaces are cleaned up and inserted as necessary before the tokenizer runs. This function and prepare_text are combined together in prepare_and_tokenize.

Usage

tokenize_space(text)

Arguments

text

A character vector to clean.

Value

The text as a list of character vectors (one vector per element of text). Each element of each vector is roughly equivalent to a word.

Examples

tokenize_space("This is some text.")

[Package piecemaker version 1.0.2 Index]