substr {stringx}R Documentation

Extract or Replace Substrings

Description

substr and substrl extract contiguous parts of given character strings. The former operates based on start and end positions while the latter is fed with substring lengths.

Their replacement versions allow for substituting parts of strings with new content.

gsubstr and gsubstrl allow for extracting or replacing multiple chunks from each string.

Usage

substr(x, start = 1L, stop = -1L)

substrl(
  x,
  start = 1L,
  length = attr(start, "match.length"),
  ignore_negative_length = FALSE
)

substr(x, start = 1L, stop = -1L) <- value

substrl(x, start = 1L, length = attr(start, "match.length")) <- value

gsubstr(x, start = list(1L), stop = list(-1L))

gsubstrl(
  x,
  start = list(1L),
  length = lapply(start, attr, "match.length"),
  ignore_negative_length = TRUE
)

gsubstr(x, start = list(1L), stop = list(-1L)) <- value

gsubstrl(x, start = list(1L), length = lapply(start, attr, "match.length")) <- value

substring(text, first = 1L, last = -1L)

substring(text, first = 1L, last = -1L) <- value

Arguments

x, text

character vector whose parts are to be extracted/replaced

start, first

numeric vector (for substr) or list of numeric vectors (for gsubstr) giving the start indexes; e.g., 1 denotes the first code point; negative indexes count from the end of a string, i.e., -1 is the last character

stop, last

numeric vector (for substr) or list of numeric vectors (for gsubstr) giving the end indexes (inclusive); note that if the start position is farther than the end position, this indicates an empty substring therein (see Examples)

length

numeric vector (for substr) or list of numeric vectors (for gsubstr) giving the substring lengths; negative lengths result in a missing value or empty vector (see ignore_negative_length) or the corresponding substring being unchanged

ignore_negative_length

single logical value; whether negative lengths should be ignored or yield missing values

value

character vector (for substr) or list of character vectors (for gsubstr) defining the replacements strings

Details

Not to be confused with sub.

substring is a [DEPRECATED] synonym for substr.

Note that these functions can break some meaningful Unicode code point sequences, e.g., when inputs are not normalised. For extracting initial parts of strings based on character width, see strtrim.

Note that gsubstr (and related functions) expect start, stop, length, and value to be lists. Non-list arguments will be converted by calling as.list. This is different from the default policy applied by stri_sub_all, which calls list.

Note that substrl and gsubstrl are interoperable with regexpr2 and gregexpr2, respectively, and hence can be considered as substituted for the [DEPRECATED] regmatches (which is more specialised).

Value

substr and substrl return a character vector (in UTF-8). gsubstr and gsubstrl return a list of character vectors.

Their replacement versions modify x 'in-place' (see Examples).

The attributes are copied from the longest arguments (similar to binary operators).

Differences from Base R

Replacements for and enhancements of base substr and substring implemented with stri_sub and stri_sub_all,

Author(s)

Marek Gagolewski

See Also

The official online manual of stringx at https://stringx.gagolewski.com/

Related function(s): strtrim, nchar, startsWith, endsWith, gregexpr

Examples

x <- "spam, spam, bacon, and spam"
base::substr(x, c(1, 13), c(4, 17))
base::substring(x, c(1, 13), c(4, 17))
substr(x, c(1, 13), c(4, 17))
substrl(x, c(1, 13), c(4, 5))

# replacement function used as an ordinary one - return a copy of x:
base::`substr<-`(x, 1, 4, value="jam")
`substr<-`(x, 1, 4, value="jam")
base::`substr<-`(x, 1, 4, value="porridge")
`substr<-`(x, 1, 4, value="porridge")

# interoperability with gregexpr2:
p <- "[\\w&&[^a]][\\w&&[^n]][\\w&&[^d]]\\w+"  # regex: all words but 'and'
gsubstrl(x, gregexpr2(x, p))
`gsubstrl<-`(x, gregexpr2(x, p), value=list(c("a", "b", "c", "d")))

# replacement function modifying x in-place:
substr(x, 1, 4) <- "eggs"
substr(x, 1, 0) <- "porridge, "        # prepend (start<stop)
substr(x, nchar(x)+1) <- " every day"  # append (start<stop)
print(x)




[Package stringx version 0.2.8 Index]