tokens_chunk {quanteda} | R Documentation |
Segment tokens object by chunks of a given size
Description
Segment tokens into new documents of equally sized token lengths, with the possibility of overlapping the chunks.
Usage
tokens_chunk(x, size, overlap = 0, use_docvars = TRUE)
Arguments
x |
tokens object whose token elements will be segmented into chunks |
size |
integer; the token length of the chunks |
overlap |
integer; the number of tokens in a chunk to be taken from the
last |
use_docvars |
if |
Value
A tokens object whose documents have been split into chunks of
length size
.
See Also
Examples
txts <- c(doc1 = "Fellow citizens, I am again called upon by the voice of
my country to execute the functions of its Chief Magistrate.",
doc2 = "When the occasion proper for it shall arrive, I shall
endeavor to express the high sense I entertain of this
distinguished honor.")
toks <- tokens(txts)
tokens_chunk(toks, size = 5)
tokens_chunk(toks, size = 5, overlap = 4)
[Package quanteda version 4.0.2 Index]