R: Lump rows of a tibble

lump_rows {tidytidbits}

R Documentation

Lump rows of a tibble

Description

A verb for a dplyr pipeline: In the given data frame, take the .level column as a set of levels and the .count column as corresponding counts. Return a data frame where the rows are lumped according to levels/counts using the parameters n, prop, other_level, ties.method like for lump(). The resulting row for other_level has level=other level, count=sum(count of all lumped rows). For the remaining columns, either a default concatenation is used, or you can provide custom summarising statements via the summarising_statements parameter. Provide a list named by the column you want to summarize, giving statements wrapped in quo(), using syntax as you would for a call to summarise().

Usage

lump_rows(
  .df,
  .level,
  .count,
  summarising_statements = quos(),
  n,
  prop,
  remaining_levels,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

Arguments

`.df`	A data frame
`.level`	Column name (symbolic) containing a set of levels
`.count`	Column name (symbolic) containing counts of the levels
`summarising_statements`	The "lumped" rows need to have all their columns summarised into one row. This parameter is a vars() list of arguments as if used in a call to `summarise()`, name is column name, value is language. If not provided for a column, a default summary will be used which takes the sum if numeric, concatenates text, or uses any() if logical.
`n`	If specified, n rows shall be preserved.
`prop`	If specified, rows shall be preserved if their count >= prop
`remaining_levels`	Levels that should explicitly not be lumped
`other_level`	Name of the "other" level to be created from lumped rows
`ties.method`	Method to apply in case of ties

Value

The lumped data frame

Lump rows of a tibble

Description

Usage

Arguments

Value

See Also