lump_rows {tidytidbits}R Documentation

Lump rows of a tibble

Description

A verb for a dplyr pipeline: In the given data frame, take the .level column as a set of levels and the .count column as corresponding counts. Return a data frame where the rows are lumped according to levels/counts using the parameters n, prop, other_level, ties.method like for lump(). The resulting row for other_level has level=other level, count=sum(count of all lumped rows). For the remaining columns, either a default concatenation is used, or you can provide custom summarising statements via the summarising_statements parameter. Provide a list named by the column you want to summarize, giving statements wrapped in quo(), using syntax as you would for a call to summarise().

Usage

lump_rows(
  .df,
  .level,
  .count,
  summarising_statements = quos(),
  n,
  prop,
  remaining_levels,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

Arguments

.df

A data frame

.level

Column name (symbolic) containing a set of levels

.count

Column name (symbolic) containing counts of the levels

summarising_statements

The "lumped" rows need to have all their columns summarised into one row. This parameter is a vars() list of arguments as if used in a call to summarise(), name is column name, value is language. If not provided for a column, a default summary will be used which takes the sum if numeric, concatenates text, or uses any() if logical.

n

If specified, n rows shall be preserved.

prop

If specified, rows shall be preserved if their count >= prop

remaining_levels

Levels that should explicitly not be lumped

other_level

Name of the "other" level to be created from lumped rows

ties.method

Method to apply in case of ties

Value

The lumped data frame

See Also

lump


[Package tidytidbits version 0.3.2 Index]