R: Format column data containing bin/interval information

fmt_bins {gt}

R Documentation

Format column data containing bin/interval information

Description

When using cut() (or other functions that use it in some way) you get bins that can look like this: "(0,10]", "(10,15]", "(15,20]", "(20,40]". This interval notation expresses the lower and upper limits of each range. The square or round brackets define whether each of the endpoints are included in the range ([/⁠]⁠ for inclusion, (/⁠)⁠ for exclusion). Should bins of this sort be present in a table, the fmt_bins() function can be used to format that syntax to a form that presents better in a display table. It's possible to format the values of the intervals with the fmt argument, and, the separator can be modified with the sep argument.

Usage

fmt_bins(
  data,
  columns = everything(),
  rows = everything(),
  sep = "--",
  fmt = NULL
)

Arguments

`data`	The gt table data object `⁠obj:<gt_tbl>⁠` // required This is the gt table object that is commonly created through use of the `gt()` function.
`columns`	Columns to target `⁠<column-targeting expression>⁠` // default: `everything()` Can either be a series of column names provided in `c()`, a vector of column indices, or a select helper function (e.g. `starts_with()`, `ends_with()`, `contains()`, `matches()`, `num_range()` and `everything()`).
`rows`	Rows to target `⁠<row-targeting expression>⁠` // default: `everything()` In conjunction with `columns`, we can specify which of their rows should undergo formatting. The default `everything()` results in all rows in `columns` being formatted. Alternatively, we can supply a vector of row captions within `c()`, a vector of row indices, or a select helper function (e.g. `starts_with()`, `ends_with()`, `contains()`, `matches()`, `num_range()`, and `everything()`). We can also use expressions to filter down to the rows we need (e.g., `⁠[colname_1] > 100 & [colname_2] < 50⁠`).
`sep`	Separator between values `⁠scalar<character>⁠` // default: `"--"` The separator text that indicates the values are ranged. The default value of `"--"` indicates that an en dash will be used for the range separator. Using `"---"` will be taken to mean that an em dash should be used. Should you want these special symbols to be taken literally, they can be supplied within `base::I()`.
`fmt`	Formatting expressions `⁠<single expression>⁠` // default: `NULL` (`optional`) An optional formatting expression in formula form. If used, the RHS of `~` should contain a formatting call (e.g., `⁠~ fmt_number(., decimals = 3, use_seps = FALSE⁠`).

Value

An object of class gt_tbl.

Compatibility of formatting function with data values

fmt_bins() is compatible with body cells that are of the "character" or "factor" types. Any other types of body cells are ignored during formatting. This is to say that cells of incompatible data types may be targeted, but there will be no attempt to format them.

Targeting cells with `columns` and `rows`

Targeting of values is done through columns and additionally by rows (if nothing is provided for rows then entire columns are selected). The columns argument allows us to target a subset of cells contained in the resolved columns. We say resolved because aside from declaring column names in c() (with bare column names or names in quotes) we can use tidyselect-style expressions. This can be as basic as supplying a select helper like starts_with(), or, providing a more complex incantation like

where(~ is.numeric(.x) && max(.x, na.rm = TRUE) > 1E6)

which targets numeric columns that have a maximum value greater than 1,000,000 (excluding any NAs from consideration).

By default all columns and rows are selected (with the everything() defaults). Cell values that are incompatible with a given formatting function will be skipped over, like character values and numeric ⁠fmt_*()⁠ functions. So it's safe to select all columns with a particular formatting function (only those values that can be formatted will be formatted), but, you may not want that. One strategy is to format the bulk of cell values with one formatting function and then constrain the columns for later passes with other types of formatting (the last formatting done to a cell is what you get in the final output).

Once the columns are targeted, we may also target the rows within those columns. This can be done in a variety of ways. If a stub is present, then we potentially have row identifiers. Those can be used much like column names in the columns-targeting scenario. We can use simpler tidyselect-style expressions (the select helpers should work well here) and we can use quoted row identifiers in c(). It's also possible to use row indices (e.g., c(3, 5, 6)) though these index values must correspond to the row numbers of the input data (the indices won't necessarily match those of rearranged rows if row groups are present). One more type of expression is possible, an expression that takes column values (can involve any of the available columns in the table) and returns a logical vector. This is nice if you want to base formatting on values in the column or another column, or, you'd like to use a more complex predicate expression.

Formatting expressions for `fmt`

We can supply a one-sided (RHS only) expression to fmt, and, several can be provided in a list. The expression uses a formatting function (e.g., fmt_number(), fmt_currency(), etc.) and it must contain an initial . that stands for the data object. If performing numeric formatting it might look something like this:

fmt = ~ fmt_number(., decimals = 1, use_seps = FALSE)

Examples

Use the countrypops dataset to create a gt table. Before even getting to the gt() call, we use cut() in conjunction with scales::breaks_log() to create some highly customized bins. Consequently each country's population in the 2021 year is assigned to a bin. These bins have a characteristic type of formatting that can be used as input to fmt_bins(), and using that formatting function allows us to customize the presentation of those ranges. For instance, here we are formatting the left and right values of the ranges with fmt_integer() (using formula syntax).

countrypops |>
  dplyr::filter(year == 2021) |>
  dplyr::select(country_code_2, population) |>
  dplyr::mutate(population_class = cut(
    population,
    breaks = scales::breaks_log(n = 20)(population)
    )
  ) |>
  dplyr::group_by(population_class) |>
  dplyr::summarize(
    count = dplyr::n(),
    countries = paste0(country_code_2, collapse = ",")
  ) |>
  dplyr::arrange(desc(population_class)) |>
  gt() |>
  fmt_flag(columns = countries) |>
  fmt_bins(
    columns = population_class,
    fmt = ~ fmt_integer(., suffixing = TRUE)
  ) |>
  cols_label(
    population_class = "Population Range",
    count = "",
    countries = "Countries"
  ) |>
  cols_width(
    population_class ~ px(150),
    count ~ px(50)
  ) |>
  tab_style(
    style = cell_text(style = "italic"),
    locations = cells_body(columns = count)
  )

This image of a table was generated from the first code example in the `fmt_bins()` help file.

Function ID

3-17

Function Introduced

v0.9.0 (Mar 31, 2023)