| stri_split_boundaries {stringi} | R Documentation | 
Split a String at Text Boundaries
Description
This function locates text boundaries (like character, word, line, or sentence boundaries) and splits strings at the indicated positions.
Usage
stri_split_boundaries(
  str,
  n = -1L,
  tokens_only = FALSE,
  simplify = FALSE,
  ...,
  opts_brkiter = NULL
)
Arguments
| str | character vector or an object coercible to | 
| n | integer vector, maximal number of strings to return | 
| tokens_only | single logical value; may affect the result if  | 
| simplify | single logical value; if  | 
| ... | additional settings for  | 
| opts_brkiter | a named list with ICU BreakIterator's settings,
see  | 
Details
Vectorized over str and n.
If n is negative (the default), then all text pieces are extracted.
Otherwise, if tokens_only is FALSE (which is the default),
then n-1 tokens are extracted (if possible) and the n-th string
gives the (non-split) remainder (see Examples).
On the other hand, if tokens_only is TRUE,
then only full tokens (up to n pieces) are extracted.
For more information on text boundary analysis
performed by ICU's BreakIterator, see
stringi-search-boundaries.
Value
If simplify=FALSE (the default),
then the functions return a list of character vectors.
Otherwise, stri_list2matrix with byrow=TRUE
and n_min=n arguments is called on the resulting object.
In such a case, a character matrix with length(str) rows
is returned. Note that stri_list2matrix's fill
argument is set to an empty string and NA,
for simplify equal to TRUE and NA, respectively.
Author(s)
Marek Gagolewski and other contributors
See Also
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other search_split: 
about_search,
stri_split_lines(),
stri_split()
Other locale_sensitive: 
%s<%(),
about_locale,
about_search_boundaries,
about_search_coll,
stri_compare(),
stri_count_boundaries(),
stri_duplicated(),
stri_enc_detect2(),
stri_extract_all_boundaries(),
stri_locate_all_boundaries(),
stri_opts_collator(),
stri_order(),
stri_rank(),
stri_sort_key(),
stri_sort(),
stri_trans_tolower(),
stri_unique(),
stri_wrap()
Other text_boundaries: 
about_search_boundaries,
about_search,
stri_count_boundaries(),
stri_extract_all_boundaries(),
stri_locate_all_boundaries(),
stri_opts_brkiter(),
stri_split_lines(),
stri_trans_tolower(),
stri_wrap()
Examples
test <- 'The\u00a0above-mentioned    features are very useful. ' %s+%
   'Spam, spam, eggs, bacon, and spam. 123 456 789'
stri_split_boundaries(test, type='line')
stri_split_boundaries(test, type='word')
stri_split_boundaries(test, type='word', skip_word_none=TRUE)
stri_split_boundaries(test, type='word', skip_word_none=TRUE, skip_word_letter=TRUE)
stri_split_boundaries(test, type='word', skip_word_none=TRUE, skip_word_number=TRUE)
stri_split_boundaries(test, type='sentence')
stri_split_boundaries(test, type='sentence', skip_sentence_sep=TRUE)
stri_split_boundaries(test, type='character')
# a filtered break iterator with the new ICU:
stri_split_boundaries('Mr. Jones and Mrs. Brown are very happy.
So am I, Prof. Smith.', type='sentence', locale='en_US@ss=standard') # ICU >= 56 only