string_extract {stringmagic} | R Documentation |
Extracts a pattern from a character vector
Description
Extracts the first, or several, patterns from a character vector.
Usage
string_extract(
x,
pattern,
single = FALSE,
simplify = TRUE,
fixed = FALSE,
ignore.case = FALSE,
word = FALSE,
unlist = FALSE,
envir = parent.frame()
)
stextract(
x,
pattern,
single = FALSE,
simplify = TRUE,
fixed = FALSE,
ignore.case = FALSE,
word = FALSE,
unlist = FALSE,
envir = parent.frame()
)
Arguments
x |
A character vector. |
pattern |
A character scalar. It represents the pattern
to be extracted from |
single |
Logical scalar, default is |
simplify |
Logical scalar, default is |
fixed |
Logical scalar, default is |
ignore.case |
Logical scalar, default is |
word |
Logical scalar, default is |
unlist |
Logical scalar, default is |
envir |
Environment in which to evaluate the interpolations if the flag |
Value
The object returned by this functions can be a list or a character vector.
If single = TRUE
, a character vector is returned, containing the value of the first match.
If no match is found, an empty string is returned.
If single = FALSE
(the default) and simplify = TRUE
(default), the object returned is:
a character vector if
x
, the vector in input, is of length 1: the character vector contains all the matches and is of length 0 if no match is found.a list of the same length as
x
. The ith element of the list is a character vector of the matches for the ith element ofx
.
If single = FALSE
(default) and simplify = FALSE
, the object returned is always a list.
Functions
-
stextract()
: Alias tostring_extract
Generic regular expression flags
All stringmagic
functions support generic flags in regular-expression patterns.
The flags are useful to quickly give extra instructions, similarly to usual
regular expression flags.
Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names
separated from the pattern with a slash (/
). Example: string_which(c("hello...", "world"), "fixed/.")
returns 1
.
Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant "any character".
The no-flag verion string_which(c("hello...", "world"), ".")
returns 1:2
.
Alternatively, and this is recommended, you can collate the initials of the flags instead of using a comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".
The four flags always available are: "ignore", "fixed", "word" and "magic".
"ignore" instructs to ignore the case. Technically, it adds the perl-flag "(?i)" at the beginning of the pattern.
"fixed" removes the regular expression interpretation, so that the characters ".", "$", "^", "[" (among others) lose their special meaning and are treated for what they are: simple characters.
"word" adds word boundaries (
"\\b"
in regex language) to the pattern. Further, the comma (","
) becomes a word separator. Technically, "word/one, two" is treated as "\b(one|two)\b". Example:string_clean("Am I ambushed?", "wi/am")
leads to " I ambushed?" thanks to the flags "ignore" and "word"."magic" allows to interpolate variables inside the pattern before regex interpretation. For example if
letters = "aiou"
thenstring_clean("My great goose!", "magic/[{letters}] => e")
leads to"My greet geese!"
Examples
cars = head(row.names(mtcars))
# Let's extract the first word:
string_extract(cars, "\\w+", single = TRUE)
# same using flags
string_extract(cars, "s/\\w+")
# extract all words composed on only letters
# NOTE: we use the flag word (`w/`)
string_extract(cars, "w/[[:alpha:]]+")
# version without flag:
string_extract(cars, "\\b[[:alpha:]]+\\b")
# If a vector of length 1 => a vector is returned
greet = "Hi Tom, how's Mary doing?"
string_extract(greet, "w/[[:upper:]]\\w+")
# version with simplify = FALSE => a list is returned
string_extract(greet, "w/[[:upper:]]\\w+", simplify = FALSE)