R: Summarise each group in nested data frames to fewer rows

nest_summarise {nplyr}

R Documentation

Summarise each group in nested data frames to fewer rows

Description

nest_summarise() creates a new set of nested data frames. Each will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in .nest_data. Each nested data frame will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

nest_summarise() and nest_summarize() are synonyms.

Usage

nest_summarise(.data, .nest_data, ..., .groups = NULL)

nest_summarize(.data, .nest_data, ..., .groups = NULL)

Arguments

`.data`	A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).
`.nest_data`	A list-column containing data frames
`...`	Name-value pairs of functions. The name will be the name of the variable in the result. The value can be: A vector of length 1, e.g. `min(x)`, `n()`, or `sum(is.na(y))`. A vector of length `n`, e.g., `quantile()`. A data frame, to add multiple columns from a single expression.
`.groups`	Grouping structure of the result. Refer to `dplyr::summarise()` for more up-to-date information.

Details

nest_summarise() is largely a wrapper for dplyr::summarise() and maintains the functionality of summarise() within each nested data frame. For more information on summarise(), please refer to the documentation in dplyr.

Value

An object of the same type as .data. Each object in the column .nest_data will usually be of the same type as the input. Each object in .nest_data has the following properties:

The rows come from the underlying group_keys()
The columns are a combination of the grouping keys and the summary expressions that you provide.
The grouping structure is controlled by the .groups argument, the output may be another grouped_df, a tibble, or a rowwise data frame.
Data frame attributes are not preserved, because nest_summarise() fundamentally creates a new data frame for each object in .nest_data.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)

# a summary applied to an ungrouped tbl returns a single row
gm_nest %>%
  nest_summarise(
    country_data,
    n = n(),
    median_pop = median(pop)
  )

# usually, you'll want to group first
gm_nest %>%
  nest_group_by(country_data, country) %>%
  nest_summarise(
    country_data,
    n = n(),
    median_pop = median(pop)
  )

[Package nplyr version 0.2.0 Index]