p_group_by {dtrackr} | R Documentation |
Stratifying your analysis
Description
Grouping a data set acts in the normal way. When tracking a dataframe
sometimes a group_by()
operation will create a lot of groups. This happens
for example if you are doing a group_by()
, summarise()
step that is
aggregating data on a fine scale, e.g. by day in a timeseries. This is
generally a terrible idea when tracking a dataframe as the resulting
flowchart will have many many branches and be illegible. dtrackr
will detect this issue and
pause tracking the dataframe with a warning. It is up to the user to the
resume()
tracking when the large number of groups have been resolved e.g.
using a dplyr::ungroup()
. This limit is configurable with
options("dtrackr.max_supported_groupings"=XX)
. The default is 16. See
dplyr::group_by()
.
Usage
p_group_by(
.data,
...,
.messages = "stratify by {.cols}",
.headline = NULL,
.tag = NULL,
.maxgroups = .defaultMaxSupportedGroupings()
)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
Arguments passed on to
|
.messages |
a set of glue specs. The glue code can use any global variable, or {.cols} which is the columns that are being grouped by. |
.headline |
a headline glue spec. The glue code can use any global variable, or {.cols}. |
.tag |
if you want the summary data from this step in the future then give it a name with .tag. |
.maxgroups |
the maximum number of subgroups allowed before the tracking is paused. |
Value
the .data but grouped.
See Also
dplyr::group_by()
Examples
library(dplyr)
library(dtrackr)
tmp = iris %>% track() %>% group_by(Species, .messages="stratify by {.cols}")
tmp %>% comment("{.strata}") %>% history()