summary_rows {gt} | R Documentation |
Add group-wise summary rows using aggregation functions
Description
Add summary rows to one or more row groups by using the table data and any
suitable aggregation functions. Multiple summary rows can be added for
selected groups via expressions given to fns
. You can selectively format
the values in the resulting summary cells by use of formatting expressions in
fmt
.
Usage
summary_rows(
data,
groups = everything(),
columns = everything(),
fns = NULL,
fmt = NULL,
side = c("bottom", "top"),
missing_text = "---",
formatter = NULL,
...
)
Arguments
data |
The gt table data object
This is the gt table object that is commonly created through use of the
|
groups |
Specification of row group IDs
The row groups to which targeting operations are constrained. Can either be
a series of row group ID values provided in |
columns |
Columns to target
The columns for which the summaries should be calculated. Can either
be a series of column names provided in |
fns |
Aggregation Expressions
Functions used for aggregations. This can include base functions like
|
fmt |
Formatting expressions
Formatting expressions in formula form. The RHS of |
side |
Side used for placement of summary rows
Should the summary rows be placed at the |
missing_text |
Replacement text for
The text to be used in place of |
formatter |
Deprecated Formatting function
Deprecated, please use |
... |
Deprecated Formatting arguments
Deprecated (along with |
Value
An object of class gt_tbl
.
Using columns
to target column data for aggregation
Targeting of column data for which aggregates should be generated is done
through the columns
argument. We can declare column names in c()
(with
bare column names or names in quotes) or we can use
tidyselect-style expressions. This can be as basic as supplying a select
helper like starts_with()
, or, providing a more complex incantation like
where(~ is.numeric(.x) && max(.x, na.rm = TRUE) > 1E6)
which targets numeric columns that have a maximum value greater than
1,000,000 (excluding any NA
s from consideration).
By default all columns are selected (with the everything()
default). This
default may be not what's needed unless all columns can undergo useful
aggregation by expressions supplied in fns
.
Aggregation expressions for fns
There are a number of ways to express how an aggregation should work for
each summary row. In addition to that, we have the ability to pass important
information such as the summary row ID value and its label (the former
necessary for targeting within tab_style()
or tab_footnote()
and the
latter used for display in the rendered table). Here are a number of
instructive examples for how to supply such expressions.
Double-sided formula with everything supplied
We can be explicit and provide a double-sided formula (in the form
<LHS> ~ <RHS>
) that expresses everything about a summary row. That is, it
has an aggregation expression (where .
represents the data in the
focused column). Here's an example:
list(id = "minimum", label = "min") ~ min(., na.rm = TRUE)
The left side (the list) contains named elements that identify the id
and
label
for the summary row. The right side has an expression for obtaining
a minimum value (dropping NA
values in the calculation).
The list()
can be replaced with c()
but the advantage of a list is
allowing the use of the md()
and html()
helper functions. The above
example can be written as:
list(id = "minimum", label = md("**Minimum**")) ~ min(., na.rm = TRUE)
and we can have that label value interpreted as Markdown text.
Function names in quotes
With fns = "min"
we get the equivalent of the fuller expression:
list(id = "min", label = "min") ~ min(., na.rm = TRUE)
For sake of convenience, common aggregation functions with the na.rm
argument will be rewritten with the na.rm = TRUE
option. These functions
are: "min"
, "max"
, "mean"
, "median"
, "sd"
, and "sum"
.
Should you need to specify multiple aggregation functions in this way (giving
you multiple summary rows), use c()
or list()
.
RHS formula expressions
With fns = ~ min(.)
or fns = list(~ min(.))
, gt will use the function
name as the id
and label
. The expansion of this shorthand to full form
looks like this:
list(id = "min", label = "min") ~ min(.)
The RHS expression is kept as written and the name portion is both the id
and the label
.
Named vector or list with RHS formula expression
Using fns = c(minimum = ~ min(.))
or fns = list(minimum = ~ min(.))
expands to this:
list(id = "minimum", label = "minimum") ~ min(.)
Unnamed vector or list with RHS formula expression
With fns = c("minimum", "min") ~ min(.)
or
fns = list("minimum", "min") ~ min(.)
the LHS contains the label
and id
values and, importantly, the order is label
first and id
second. This can
be rewritten as:
list(id = "min", label = "minimum") ~ min(.)
If the vector or list is partially named, gt has enough to go on to
disambiguate the unnamed element. So with
fns = c("minimum", label = "min") ~ min(.)
, "min"
is indeed the label
and "minimum"
is taken as the id
value.
A fully named list with three specific elements
We can avoid using a formula if we are satisfied with the default options of
a function (except some of those functions with the na.rm
options, see
above). Instead, a list with the named elements id
, label
, and fn
could
be used. It can look like this:
fns = list(id = "mean_id", label = "average", fn = "mean")
which translates to
list(id = "mean_id", label = "average") ~ mean(., na.rm = TRUE)
Formatting expressions for fmt
Given that we are generating new data in a table, we might also want to
take the opportunity to format those new values right away. We can do this
in the fmt
argument, either with a single expression or a number of them
in a list.
Formatting cells across all groups
We can supply a one-sided (RHS only) or two-sided expression (targeting
groups) to fmt
, and, several can be provided in a list. The RHS will always
contain an expression that uses a formatting function (e.g., fmt_number()
,
fmt_currency()
, etc.) and it must contain an initial .
that stands for
the data object. If performing numeric formatting on all columns in the new
summary rows, it might look something like this:
fmt = ~ fmt_number(., decimals = 1, use_seps = FALSE)
We can use the columns
and rows
arguments that are available in every
formatting function. This allows us to format only a subset of columns or
rows. Summary rows can be targeted by using their ID values and these are
settable within expressions given to fns
(see the Aggregation expressions
for fns
section for details on this). Here's an example with hypothetical
column and row names:
fmt = ~ fmt_number(., columns = num, rows = "mean", decimals = 3)
Formatting cells in specific groups
A two-sided expression is needed for targeting the formatting directives to
specific summary row groups. In this format, the LHS should contain an
expression that resolves to a set of available groups. We can use a single
row group name in quotes, several of those in a vector, or a select helper
expression like starts_with()
or matches()
.
In a situation where summary rows were generated across the row groups named
"group_1"
, "group_2"
, and "group_3"
, we could format all summary cells
in "group_2"
with the following:
fmt = "group_2" ~ fmt_number(., decimals = 1, use_seps = FALSE)
If you wanted to target the latter two groups, this can be done:
fmt = matches("2|3") ~ fmt_number(., decimals = 1, use_seps = FALSE)
Should you need to target a single cell, the LHS expression for group
targeting could be paired with single values for columns
and rows
on the
RHS formatting expression. Like this:
fmt = "group_1" ~ fmt_number(., columns = num, rows = "mean")
Extraction of summary rows
Should we need to obtain the summary data for external purposes,
extract_summary()
can be used with a gt_tbl
object where summary rows
were added via summary_rows()
or grand_summary_rows()
.
Examples
Use a modified version of sp500
dataset to create a gt table with row
groups and row labels. Create the summary rows labeled min
, max
, and
avg
by row group (where each each row group is a week number) with
summary_rows()
.
sp500 |> dplyr::filter(date >= "2015-01-05" & date <= "2015-01-16") |> dplyr::arrange(date) |> dplyr::mutate(week = paste0("W", strftime(date, format = "%V"))) |> dplyr::select(-adj_close, -volume) |> gt( rowname_col = "date", groupname_col = "week" ) |> summary_rows( fns = list( "min", "max", list(label = "avg", fn = "mean") ), fmt = ~ fmt_number(., use_seps = FALSE) )
Using the countrypops
dataset, let's process that a bit before giving it
to gt. We can create a summary rows with totals that appear at the top of
each row group (with side = "top"
). We can define the aggregation with a
list that contains parameters for the summary row label (md("**ALL**")
),
the shared ID value of those rows across groups ("totals"
), and the
aggregation function (expressed as "sum"
, which gt recognizes as the
sum()
function). To top it all off, we'll add background fills to the
summary rows with tab_style()
.
countrypops |> dplyr::filter( country_code_2 %in% c("BR", "RU", "IN", "CN", "FR", "DE", "IT", "GB") ) |> dplyr::filter(year %% 10 == 0) |> dplyr::select(country_name, year, population) |> tidyr::pivot_wider(names_from = year, values_from = population) |> gt(rowname_col = "country_name") |> tab_row_group( label = md("*BRIC*"), rows = c("Brazil", "Russian Federation", "India", "China"), id = "bric" ) |> tab_row_group( label = md("*Big Four*"), rows = c("France", "Germany", "Italy", "United Kingdom"), id = "big4" ) |> row_group_order(groups = c("bric", "big4")) |> tab_stub_indent(rows = everything()) |> tab_header(title = "Populations of the BRIC and Big Four Countries") |> tab_spanner(columns = everything(), label = "Year") |> fmt_number(n_sigfig = 3, suffixing = TRUE) |> summary_rows( fns = list(label = md("**ALL**"), id = "totals", fn = "sum"), fmt = ~ fmt_number(., n_sigfig = 3, suffixing = TRUE), side = "top" ) |> tab_style( locations = cells_summary(), style = cell_fill(color = "lightblue" |> adjust_luminance(steps = +1)) )
Function ID
6-1
Function Introduced
v0.2.0.5
(March 31, 2020)
See Also
Other row addition/modification functions:
grand_summary_rows()
,
row_group_order()
,
rows_add()