R: Extract Regular Expression Matches Into a Data Frame

re_match {rematch2}

R Documentation

Extract Regular Expression Matches Into a Data Frame

Description

re_match wraps regexpr and returns the match results in a convenient data frame. The data frame has one column for each capture group if perl=TRUE, and one final columns called .match for the matching (sub)string. The columns of the capture groups are named if the groups themselves are named.

Usage

re_match(text, pattern, perl = TRUE, ...)

Arguments

`text`	Character vector.
`pattern`	A regular expression. See `regex` for more about regular expressions.
`perl`	logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.
`...`	Additional arguments to pass to `regexpr`.

Value

A data frame of character vectors: one column per capture group, named if the group was named, and additional columns for the input text and the first matching (sub)string. Each row corresponds to an element in the text vector.

Note

re_match uses PCRE compatible regular expressions by default (i.e. perl = TRUE in regexpr). You can switch this off but if you do so capture groups will no longer be reported as they are only supported by PCRE.

Examples

dates <- c("2016-04-20", "1977-08-08", "not a date", "2016",
  "76-03-02", "2012-06-30", "2015-01-21 19:58")
isodate <- "([0-9]{4})-([0-1][0-9])-([0-3][0-9])"
re_match(text = dates, pattern = isodate)

# The same with named groups
isodaten <- "(?<year>[0-9]{4})-(?<month>[0-1][0-9])-(?<day>[0-3][0-9])"
re_match(text = dates, pattern = isodaten)

[Package rematch2 version 2.1.2 Index]