R: Split texts into segments

split_text {deeplr}

R Documentation

Split texts into segments

split_text splits texts into blocks of a maximum number of bytes.

split_text(text, max_size_bytes = 29000, tokenize = "sentences")

`text`	character vector to be split.
`max_size_bytes`	maximum size of a single text segment in bytes.
`tokenize`	level of tokenization. Either "sentences" or "words".

The function uses tokenizers::tokenize_sentences to split texts.

Returns a (tibble) with the following columns:

## Not run: 
# Split long text
text <- paste0(rep("This is a very long text.", 10000), collapse = " ")
split_text(text)

## End(Not run)

[Package deeplr version 2.0.1 Index]