R: Fast grouped statistical summary for data frames.

stat_summarise {timeplyr}

R Documentation

Fast grouped statistical summary for data frames.

Description

collapse and data.table are used for the calculations.

Usage

stat_summarise(
  data,
  ...,
  stat = c("n", "nmiss", "ndistinct"),
  q_probs = NULL,
  na.rm = TRUE,
  sort = df_group_by_order_default(data),
  .count_name = NULL,
  .names = NULL,
  .by = NULL,
  .cols = NULL,
  inform_stats = TRUE,
  as_tbl = FALSE
)

.stat_fns

Arguments

`data`	A data frame.
`...`	Variables to apply the statistical functions to. Tidy data-masking applies.
`stat`	A character vector of statistical summaries to apply. This can be one or more of the following: "n", "nmiss", "ndistinct", "min", "max", "mean", "first", "last", "sd", "var", "mode", "median", "sum", "prop_complete".
`q_probs`	(Optional) Quantile probabilities. If supplied, `q_summarise()` is called and added to the result.
`na.rm`	Should `NA` values be removed? Default is `TRUE`.
`sort`	Should groups be sorted? Default is `TRUE`.
`.count_name`	Name of count column, default is "n".
`.names`	An optional glue specification passed to `stringr::glue()`. If `.names = NULL`, then when there is 1 variable, the function name is used, i.e `.names = "{.fn}"`, when there are multiple variables and 1 function, the variable names are used, i.e, `.names = "{.col}"` and in the case of multiple variables and functions. `"{.col}_{.fn}"` is used.
`.by`	(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.
`.cols`	(Optional) alternative to `...` that accepts a named character vector or numeric vector. If speed is an expensive resource, it is recommended to use this.
`inform_stats`	Should available stat functions be displayed at the start of each session? Default is `TRUE`.
`as_tbl`	Should the result be a `tibble`? Default is `FALSE`.

Format

.stat_fns

An object of class character of length 14.

Details

stat_summarise() can apply multiple functions to multiple variables.

stat_summarise() is equivalent to
data %>% group_by(...) %>% summarise(across(..., list(...)))
but is faster and more efficient and accepts limited statistical functions.

Value

A summary data.table containing the summary values for each group.

Examples

library(timeplyr)
library(dplyr)

stat_df <- iris %>%
  stat_summarise(Sepal.Length, .by = Species)
# Join quantile info too
q_df <- iris %>%
  q_summarise(Sepal.Length, .by = Species)
summary_df <- left_join(stat_df, q_df, by = "Species")
summary_df

# Multiple cols
iris %>%
  group_by(Species) %>%
  stat_summarise(across(contains("Width")),
            stat = c("min", "max", "mean", "sd"))

[Package timeplyr version 0.8.1 Index]