string_split {stringmagic} | R Documentation |
Splits a character string wrt a pattern
Description
Splits a character string with respect to pattern
Usage
string_split(
x,
split,
simplify = TRUE,
fixed = FALSE,
ignore.case = FALSE,
word = FALSE,
envir = parent.frame()
)
stsplit(
x,
split,
simplify = TRUE,
fixed = FALSE,
ignore.case = FALSE,
word = FALSE,
envir = parent.frame()
)
Arguments
x |
A character vector. |
split |
A character scalar. Used to split the character vectors. By default
this is a regular expression. You can use flags in the pattern in the form |
simplify |
Logical scalar, default is |
fixed |
Logical, default is |
ignore.case |
Logical scalar, default is |
word |
Logical scalar, default is |
envir |
Environment in which to evaluate the interpolations if the flag |
Value
If simplify = TRUE
(default), the object returned is:
a character vector if
x
, the vector in input, is of length 1: the character vector contains the result of the split.a list of the same length as
x
. The ith element of the list is a character vector containing the result of the split of the ith element ofx
.
If simplify = FALSE
, the object returned is always a list.
Functions
-
stsplit()
: Alias tostring_split
Generic regular expression flags
All stringmagic
functions support generic flags in regular-expression patterns.
The flags are useful to quickly give extra instructions, similarly to usual
regular expression flags.
Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names
separated from the pattern with a slash (/
). Example: string_which(c("hello...", "world"), "fixed/.")
returns 1
.
Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant "any character".
The no-flag verion string_which(c("hello...", "world"), ".")
returns 1:2
.
Alternatively, and this is recommended, you can collate the initials of the flags instead of using a comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".
The four flags always available are: "ignore", "fixed", "word" and "magic".
"ignore" instructs to ignore the case. Technically, it adds the perl-flag "(?i)" at the beginning of the pattern.
"fixed" removes the regular expression interpretation, so that the characters ".", "$", "^", "[" (among others) lose their special meaning and are treated for what they are: simple characters.
"word" adds word boundaries (
"\\b"
in regex language) to the pattern. Further, the comma (","
) becomes a word separator. Technically, "word/one, two" is treated as "\b(one|two)\b". Example:string_clean("Am I ambushed?", "wi/am")
leads to " I ambushed?" thanks to the flags "ignore" and "word"."magic" allows to interpolate variables inside the pattern before regex interpretation. For example if
letters = "aiou"
thenstring_clean("My great goose!", "magic/[{letters}] => e")
leads to"My greet geese!"
Examples
time = "This is the year 2024."
# we break the sentence
string_split(time, " ")
# simplify = FALSE leads to a list
string_split(time, " ", simplify = FALSE)
# let's break at "is"
string_split(time, "is")
# now breaking at the word "is"
# NOTE: we use the flag `word` (`w/`)
string_split(time, "w/is")
# same but using a pattern from a variable
# NOTE: we use the `magic` flag
pat = "is"
string_split(time, "mw/{pat}")