string_split {stringmagic}R Documentation

Splits a character string wrt a pattern

Description

Splits a character string with respect to pattern

Usage

string_split(
  x,
  split,
  simplify = TRUE,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  envir = parent.frame()
)

stsplit(
  x,
  split,
  simplify = TRUE,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  envir = parent.frame()
)

Arguments

x

A character vector.

split

A character scalar. Used to split the character vectors. By default this is a regular expression. You can use flags in the pattern in the form ⁠flag1, flag2/pattern⁠. Available flags are ignore (case), fixed (no regex), word (add word boundaries), magic (add interpolation with "{}"). Example: if "ignore/hello" and the text contains "Hello", it will be split at "Hello". Shortcut: use the first letters of the flags. Ex: "iw/one" will split at the word "one" (flags 'ignore' + 'word').

simplify

Logical scalar, default is TRUE. If TRUE, then when the vector input x is of length 1, a character vector is returned instead of a list.

fixed

Logical, default is FALSE. Whether to consider the argument split as fixed (and not as a regular expression).

ignore.case

Logical scalar, default is FALSE. If TRUE, then case insensitive search is triggered.

word

Logical scalar, default is FALSE. If TRUE then a) word boundaries are added to the pattern, and b) patterns can be chained by separating them with a comma, they are combined with an OR logical operation. Example: if word = TRUE, then pattern = "The, mountain" will select strings containing either the word 'The' or the word 'mountain'.

envir

Environment in which to evaluate the interpolations if the flag "magic" is provided. Default is parent.frame().

Value

If simplify = TRUE (default), the object returned is:

If simplify = FALSE, the object returned is always a list.

Functions

Generic regular expression flags

All stringmagic functions support generic flags in regular-expression patterns. The flags are useful to quickly give extra instructions, similarly to usual regular expression flags.

Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names separated from the pattern with a slash (/). Example: string_which(c("hello...", "world"), "fixed/.") returns 1. Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant "any character". The no-flag verion string_which(c("hello...", "world"), ".") returns 1:2.

Alternatively, and this is recommended, you can collate the initials of the flags instead of using a comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".

The four flags always available are: "ignore", "fixed", "word" and "magic".

Examples


time = "This is the year 2024."

# we break the sentence
string_split(time, " ")

# simplify = FALSE leads to a list
string_split(time, " ", simplify = FALSE)

# let's break at "is"
string_split(time, "is")

# now breaking at the word "is"
# NOTE: we use the flag `word` (`w/`)
string_split(time, "w/is")

# same but using a pattern from a variable
# NOTE: we use the `magic` flag
pat = "is"
string_split(time, "mw/{pat}")



[Package stringmagic version 1.1.2 Index]