R: Apply a function to each group

group_map {dplyr}

R Documentation

Apply a function to each group

Description

group_map(), group_modify() and group_walk() are purrr-style functions that can be used to iterate on grouped tibbles.

Usage

group_map(.data, .f, ..., .keep = FALSE)

group_modify(.data, .f, ..., .keep = FALSE)

group_walk(.data, .f, ..., .keep = FALSE)

Arguments

`.data`	A grouped tibble
`.f`	A function or formula to apply to each group. If a function, it is used as is. It should have at least 2 formal arguments. If a formula, e.g. `~ head(.x)`, it is converted to a function. In the formula, you can use `.` or `.x` to refer to the subset of rows of `.tbl` for the given group `.y` to refer to the key, a one row tibble with one column per grouping variable that identifies the group
`...`	Additional arguments passed on to `.f`
`.keep`	are the grouping variables kept in `.x`

Details

Use group_modify() when summarize() is too limited, in terms of what you need to do and return for each group. group_modify() is good for "data frame in, data frame out". If that is too limited, you need to use a nested or split workflow. group_modify() is an evolution of do(), if you have used that before.

Each conceptual group of the data frame is exposed to the function .f with two pieces of information:

The subset of the data for the group, exposed as .x.
The key, a tibble with exactly one row and columns for each grouping variable, exposed as .y.

For completeness, group_modify(), group_map and group_walk() also work on ungrouped data frames, in that case the function is applied to the entire data frame (exposed as .x), and .y is a one row tibble with no column, consistently with group_keys().

Value

group_modify() returns a grouped tibble. In that case .f must return a data frame.
group_map() returns a list of results from calling .f on each group.
group_walk() calls .f for side effects and returns the input .tbl, invisibly.

Examples


# return a list
mtcars %>%
  group_by(cyl) %>%
  group_map(~ head(.x, 2L))

# return a tibble grouped by `cyl` with 2 rows per group
# the grouping data is recalculated
mtcars %>%
  group_by(cyl) %>%
  group_modify(~ head(.x, 2L))


# a list of tibbles
iris %>%
  group_by(Species) %>%
  group_map(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)))

# a restructured grouped tibble
iris %>%
  group_by(Species) %>%
  group_modify(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)))


# a list of vectors
iris %>%
  group_by(Species) %>%
  group_map(~ quantile(.x$Petal.Length, probs = c(0.25, 0.5, 0.75)))

# to use group_modify() the lambda must return a data frame
iris %>%
  group_by(Species) %>%
  group_modify(~ {
     quantile(.x$Petal.Length, probs = c(0.25, 0.5, 0.75)) %>%
     tibble::enframe(name = "prob", value = "quantile")
  })

iris %>%
  group_by(Species) %>%
  group_modify(~ {
    .x %>%
      purrr::map_dfc(fivenum) %>%
      mutate(nms = c("min", "Q1", "median", "Q3", "max"))
  })

# group_walk() is for side effects
dir.create(temp <- tempfile())
iris %>%
  group_by(Species) %>%
  group_walk(~ write.csv(.x, file = file.path(temp, paste0(.y$Species, ".csv"))))
list.files(temp, pattern = "csv$")
unlink(temp, recursive = TRUE)

# group_modify() and ungrouped data frames
mtcars %>%
  group_modify(~ head(.x, 2L))

[Package dplyr version 1.1.4 Index]