R: Convenience functions in support of regular expressions

re_convenience {mclm}

R Documentation

Convenience functions in support of regular expressions

Description

These functions are essentially simple wrappers around base R functions such as regexpr(), gregexpr(), grepl(), grep(), sub() and gsub(). The most important differences between the functions documented here and the R base functions is the order of the arguments (x before pattern) and the fact that the argument perl is set to TRUE by default.

Usage

re_retrieve_first(
  x,
  pattern,
  ignore.case = FALSE,
  perl = TRUE,
  fixed = FALSE,
  useBytes = FALSE,
  requested_group = NULL,
  drop_NA = FALSE,
  ...
)

re_retrieve_last(
  x,
  pattern,
  ignore.case = FALSE,
  perl = TRUE,
  fixed = FALSE,
  useBytes = FALSE,
  requested_group = NULL,
  drop_NA = FALSE,
  ...
)

re_retrieve_all(
  x,
  pattern,
  ignore.case = FALSE,
  perl = TRUE,
  fixed = FALSE,
  useBytes = FALSE,
  requested_group = NULL,
  unlist = TRUE,
  ...
)

re_has_matches(
  x,
  pattern,
  ignore.case = FALSE,
  perl = TRUE,
  fixed = FALSE,
  useBytes = FALSE,
  ...
)

re_which(
  x,
  pattern,
  ignore.case = FALSE,
  perl = TRUE,
  fixed = FALSE,
  useBytes = FALSE,
  ...
)

re_replace_first(
  x,
  pattern,
  replacement,
  ignore.case = FALSE,
  perl = TRUE,
  fixed = FALSE,
  useBytes = FALSE,
  ...
)

re_replace_all(
  x,
  pattern,
  replacement,
  ignore.case = FALSE,
  perl = TRUE,
  fixed = FALSE,
  useBytes = FALSE,
  ...
)

Arguments

`x`	Character vector to be searched or modified.
`pattern`	Regular expression specifying what is to be searched.
`ignore.case`	Logical. Should the search be case insensitive?
`perl`	Logical. Whether the regular expressions use the PCRE flavor of regular expression. Unlike in base R functions, the default is `TRUE`.
`fixed`	Logical. If `TRUE`, `pattern` is a string to be matched as is, i.e. wildcards and special characters are not interpreted.
`useBytes`	Logical. If `TRUE` the matching is done byte-by-byte rather than character-by-character. See 'Details' in `grep()`.
`requested_group`	Numeric. If `NULL` or `0`, the output will contain matches for `pattern` as a whole. If another number `n` is provided, then the output will not contain matches for `pattern` but instead will only contain the matches for the `n`th capturing group in `pattern` (the first if `requested_group = 1`, the second if `requested_group = 2`...).
`drop_NA`	Logical. If `FALSE`, the output always has the same length as the input `x` and items that do not contain a match for `pattern` yield `NA`. If `TRUE`, such `NA` values are removed and therefore the result might contain fewer items than `x`.
`...`	Additional arguments.
`unlist`	Logical. If `FALSE`, the output always has the same length as the input `x`. More specifically, the result will be a list in which input items that do not contain a match for `pattern` yield an empty vector, whereas input items that do match will yield a vector of at least length one (depending on the number of matches). If `TRUE`, the output is a single vector the length of which may be shorter or longer than `x`.
`replacement`	Character vector of length one specifying the replacement string. It is to be taken literally, except that the notation `⁠\\1⁠`, `⁠\\2⁠`, etc. can be used to refer to groups in `pattern`.

Details

For some of the arguments (e.g. perl, fixed) the reader is directed to base R's regex documentation.

Value

re_retrieve_first(), re_retrieve_last() and re_retrieve_all() return either a single vector of character data or a list containing such vectors. re_replace_first() and re_replace_all() return the same type of character vector as x.

re_has_matches() returns a logical vector indicating whether a match was found in each of the elements in x; re_which() returns a numeric vector indicating the indices of the elements of x for which a match was found.

Functions

re_retrieve_first(): Retrieve from each item in x the first match of pattern.
re_retrieve_last(): Retrieve from each item in x the last match of pattern.
re_retrieve_all(): Retrieve from each item in x all matches of pattern.
re_has_matches(): Simple wrapper around grepl().
re_which(): Simple wrapper around grep().
re_replace_first(): Simple wrapper around sub().
re_replace_all(): Simple wrapper around gsub().

Examples

x <- tokenize("This is a sentence with a couple of words in it.")
pattern <- "[oe](.)(.)"

re_retrieve_first(x, pattern)
re_retrieve_first(x, pattern, drop_NA = TRUE)
re_retrieve_first(x, pattern, requested_group = 1)
re_retrieve_first(x, pattern, drop_NA = TRUE, requested_group = 1)
re_retrieve_first(x, pattern, requested_group = 2)

re_retrieve_last(x, pattern)
re_retrieve_last(x, pattern, drop_NA = TRUE)
re_retrieve_last(x, pattern, requested_group = 1)
re_retrieve_last(x, pattern, drop_NA = TRUE, requested_group = 1)
re_retrieve_last(x, pattern, requested_group = 2)

re_retrieve_all(x, pattern)
re_retrieve_all(x, pattern, unlist = FALSE)
re_retrieve_all(x, pattern, requested_group = 1)
re_retrieve_all(x, pattern, unlist = FALSE, requested_group = 1)
re_retrieve_all(x, pattern, requested_group = 2)

re_replace_first(x, "([oe].)", "{\\1}")
re_replace_all(x, "([oe].)", "{\\1}")

[Package mclm version 0.2.7 Index]