split_text {deeplr}R Documentation

Split texts into segments

Description

split_text splits texts into blocks of a maximum number of bytes.

Usage

split_text(text, max_size_bytes = 29000, tokenize = "sentences")

Arguments

text

character vector to be split.

max_size_bytes

maximum size of a single text segment in bytes.

tokenize

level of tokenization. Either "sentences" or "words".

Details

The function uses tokenizers::tokenize_sentences to split texts.

Value

Returns a (tibble) with the following columns:

Examples

## Not run: 
# Split long text
text <- paste0(rep("This is a very long text.", 10000), collapse = " ")
split_text(text)

## End(Not run)


[Package deeplr version 2.0.1 Index]