strsplit {stringx}R Documentation

Split Strings into Tokens

Description

Splits each string into chunks delimited by occurrences of a given pattern.

Usage

strsplit(
  x,
  pattern = split,
  ...,
  ignore_case = ignore.case,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  ignore.case = FALSE,
  split
)

Arguments

x

character vector whose elements are to be examined

pattern

character vector of nonempty search patterns

...

further arguments to stri_split, e.g., omit_empty, locale, dotall

ignore_case

single logical value; indicates whether matching should be case-insensitive

fixed

single logical value; FALSE for matching with regular expressions (see about_search_regex); TRUE for fixed pattern matching (about_search_fixed); NA for the Unicode collation algorithm (about_search_coll)

perl, useBytes

not used (with a warning if attempting to do so) [DEPRECATED]

ignore.case

alias to the ignore_case argument [DEPRECATED]

split

alias to the pattern argument [DEPRECATED]

Details

This function is fully vectorised with respect to both arguments.

For splitting text into 'characters' (grapheme clusters), words, or sentences, use stri_split_boundaries instead.

Value

Returns a list of character vectors representing the identified tokens.

Differences from Base R

Replacements for base strsplit implemented with stri_split.

Author(s)

Marek Gagolewski

See Also

The official online manual of stringx at https://stringx.gagolewski.com/

Related function(s): paste, nchar, grepl, gsub, substr

Examples

stringx::strsplit(c(x="a, b", y="c,d,  e"), ",\\s*")
x <- strcat(c(
    "abc", "123", ",!.", "\U0001F4A9",
    "\U0001F64D\U0001F3FC\U0000200D\U00002642\U0000FE0F",
    "\U000026F9\U0001F3FF\U0000200D\U00002640\U0000FE0F",
    "\U0001F3F4\U000E0067\U000E0062\U000E0073\U000E0063\U000E0074\U000E007F"
))
# be careful when splitting into individual code points:
base::strsplit(x, "")  # stringx does not support this
stringx::strsplit(x, "(?s)(?=.)", omit_empty=TRUE)  # look-ahead for any char with dot-all
stringi::stri_split_boundaries(x, type="character")  # grapheme clusters


[Package stringx version 0.2.8 Index]