regexpr2 {stringx}R Documentation

Locate Pattern Occurrences

Description

regexpr2 and gregexpr2 locate, respectively, first and all (i.e., globally) occurrences of a pattern. regexec2 and gregexec2 can additionally pinpoint the matches to parenthesised subexpressions (regex capture groups).

Usage

regexpr2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE)

gregexpr2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE)

regexec2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE)

gregexec2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE)

regexpr(
  pattern,
  x = text,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  text
)

gregexpr(
  pattern,
  x = text,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  text
)

regexec(
  pattern,
  x = text,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  text
)

gregexec(
  pattern,
  x = text,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  text
)

Arguments

x

character vector whose elements are to be examined

pattern

character vector of nonempty search patterns

...

further arguments to stri_locate, e.g., omit_empty, locale, dotall

ignore_case, ignore.case

single logical value; indicates whether matching should be case-insensitive

fixed

single logical value; FALSE for matching with regular expressions (see about_search_regex); TRUE for fixed pattern matching (about_search_fixed); NA for the Unicode collation algorithm (about_search_coll)

perl, useBytes

not used (with a warning if attempting to do so) [DEPRECATED]

text

alias to the x argument [DEPRECATED]

Details

These functions are fully vectorised with respect to both x and pattern.

Use substrl and gsubstrl to extract or replace the identified chunks. Also, consider using regextr2 and gregextr2 directly instead.

Value

regexpr2 and [DEPRECATED] regexpr return an integer vector which gives the start positions of the first substrings matching a pattern. The match.length attribute gives the corresponding match lengths. If there is no match, the two values are set to -1.

gregexpr2 and [DEPRECATED] gregexpr yield a list whose elements are integer vectors with match.length attributes, giving the positions of all the matches. For consistency with regexpr2, a no-match is denoted with a single -1, hence the output is guaranteed to consist of non-empty integer vectors.

regexec2 and [DEPRECATED] regexec return a list of integer vectors giving the positions of the first matches and the locations of matches to the consecutive parenthesised subexpressions (which can only be recognised if fixed=FALSE). Each vector is equipped with the match.length attribute.

gregexec2 and [DEPRECATED] gregexec generate a list of matrices, where each column corresponds to a separate match; the first row is the start index of the match, the second row gives the position of the first captured group, and so forth. Their match.length attributes are matrices of corresponding sizes.

These functions preserve the attributes of the longest inputs (unless they are dropped due to coercion). Missing values in the inputs are propagated consistently.

Differences from Base R

Replacements for base gregexpr (and others) implemented with stri_locate.

Author(s)

Marek Gagolewski

See Also

The official online manual of stringx at https://stringx.gagolewski.com/

Related function(s): paste, nchar, strsplit, gsub2, grepl2, gregextr2, gsubstrl

Examples

x <- c(aca1="acacaca", aca2="gaca", noaca="actgggca", na=NA)
regexpr2(x, "(A)[ACTG]\\1", ignore_case=TRUE)
regexpr2(x, "aca") >= 0  # like grepl2
gregexpr2(x, "aca", fixed=TRUE, overlap=TRUE)

# two named capture groups:
regexec2(x, "(?<x>a)(?<y>cac?)")
gregexec2(x, "(?<x>a)(?<y>cac?)")

# extraction:
gsubstrl(x, gregexpr2(x, "(A)[ACTG]\\1", ignore_case=TRUE))
gregextr2(x, "(A)[ACTG]\\1", ignore_case=TRUE)  # equivalent


[Package stringx version 0.2.8 Index]