sub2 {stringx}R Documentation

Replace Pattern Occurrences

Description

sub2 replaces the first pattern occurrence in each string with a given replacement string. gsub2 replaces all (i.e., 'globally') pattern matches.

Usage

sub2(x, pattern, replacement, ..., ignore_case = FALSE, fixed = FALSE)

gsub2(x, pattern, replacement, ..., ignore_case = FALSE, fixed = FALSE)

sub(
  pattern,
  replacement,
  x,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE
)

gsub(
  pattern,
  replacement,
  x,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE
)

Arguments

x

character vector with strings whose chunks are to be modified

pattern

character vector of nonempty search patterns

replacement

character vector with the corresponding replacement strings; in sub2 and gsub2, back-references (whenever fixed=FALSE) are indicated by $0..$99 and $<name>, whereas the base-R compatible sub and gsub, only allow \1..\9

...

further arguments to stri_replace_first or stri_replace_all, e.g., locale, dotall

ignore_case, ignore.case

single logical value; indicates whether matching should be case-insensitive

fixed

single logical value; FALSE for matching with regular expressions (see about_search_regex); TRUE for fixed pattern matching (about_search_fixed); NA for the Unicode collation algorithm (about_search_coll)

perl, useBytes

not used (with a warning if attempting to do so) [DEPRECATED]

Details

Not to be confused with substr.

These functions are fully vectorised with respect to x, pattern, and replacement.

gsub2 uses vectorise_all=TRUE because of the attribute preservation rules, stri_replace_all should be called directly if different behaviour is needed.

The [DEPRECATED] sub and [DEPRECATED] gsub simply call sub2 and gsub2 which have a cleaned-up argument list. Additionally, if fixed=FALSE, the back-references in replacement strings are converted to these accepted by the ICU regex engine.

Value

Both functions return a character vector. They preserve the attributes of the longest inputs (unless they are dropped due to coercion).

Differences from Base R

Replacements for base sub and gsub implemented with stri_replace_first and stri_replace_all, respectively.

Author(s)

Marek Gagolewski

See Also

The official online manual of stringx at https://stringx.gagolewski.com/

Related function(s): paste, nchar, grepl2, gregexpr2, gregextr2 strsplit, gsubstr

trimws for removing whitespaces (amongst others) from the start or end of strings

Examples

"change \U0001f602 me \U0001f603" |> gsub2("\\p{L}+", "O_O")

x <- c("mario", "Mario", "M\u00E1rio", "M\u00C1RIO", "Mar\u00EDa", "Rosario", NA)
sub2(x, "mario", "M\u00E1rio", fixed=NA, strength=1L)
sub2(x, "mario", "Mario", fixed=NA, strength=2L)

x <- "abcdefghijklmnopqrstuvwxyz"
p <- "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)"
base::sub(p, "\\1\\9", x)
base::gsub(p, "\\1\\9", x)
base::gsub(p, "\\1\\9", x, perl=TRUE)
base::gsub(p, "\\1\\13", x)
sub2(x, p, "$1$13")
gsub2(x, p, "$1$13")



[Package stringx version 0.2.8 Index]