split_segments {rainette} | R Documentation |
Split a character string or corpus into segments
Description
Split a character string or corpus into segments, taking into account punctuation where possible
Usage
split_segments(obj, segment_size = 40, segment_size_window = NULL)
## S3 method for class 'character'
split_segments(obj, segment_size = 40, segment_size_window = NULL)
## S3 method for class 'Corpus'
split_segments(obj, segment_size = 40, segment_size_window = NULL)
## S3 method for class 'corpus'
split_segments(obj, segment_size = 40, segment_size_window = NULL)
## S3 method for class 'tokens'
split_segments(obj, segment_size = 40, segment_size_window = NULL)
Arguments
obj |
character string, quanteda or tm corpus object |
segment_size |
segment size (in words) |
segment_size_window |
window around segment size to look for best splitting point |
Value
If obj is a tm or quanteda corpus object, the result is a quanteda corpus.
Examples
require(quanteda)
split_segments(data_corpus_inaugural)
[Package rainette version 0.3.1.1 Index]