R: Extract a character column into multiple columns using regex...

nest_extract {nplyr}

R Documentation

Extract a character column into multiple columns using regex groups in a column of nested data frames

Description

nest_extract() is used to extract capturing groups from a column in a nested data frame using regular expressions into a new column. If the groups don't match, or the input is NA, the output will be NA.

Usage

nest_extract(
  .data,
  .nest_data,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

`.data`	A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).
`.nest_data`	A list-column containing data frames
`col`	Column name or position within `.nest_data` (must be present within all nested data frames in `.nest_data`). This is passed to `tidyselect::vars_pull()`. This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).
`into`	Names of new variables to create as character vector. Use `NA` to omit the variable in the output.
`regex`	A string representing a regular expression used to extract the desired values. There should be one group (defined by `⁠()⁠`) for each element of `into`.
`remove`	If `TRUE`, remove input column from output data frame.
`convert`	If `TRUE`, will run `type.convert()` with `as.is = TRUE` on new columns. This is useful if the component columns are integer, numeric or logical. NB: this will cause string `"NA"`s to be converted to `NA`s.
`...`	Additional arguments passed on to `tidyr::extract()` methods.

Details

nest_extract() is a wrapper for tidyr::extract() and maintains the functionality of extract() within each nested data frame. For more information on extract() please refer to the documentation in 'tidyr'.

Value

An object of the same type as .data. Each object in the column .nest_data will have new columns created according to the capture groups specified in the regular expression.

Examples

set.seed(123)
gm <- gapminder::gapminder 
gm <- gm %>% mutate(comb = sample(c(NA, "a-b", "a-d", "b-c", "d-e"),size = nrow(gm),replace = TRUE))
gm_nest <- gm %>% tidyr::nest(country_data = -continent)

gm_nest %>% 
  nest_extract(.nest_data = country_data,
               col = comb,
               into = c("var1","var2"),
               regex = "([[:alnum:]]+)-([[:alnum:]]+)")

[Package nplyr version 0.2.0 Index]