tbl_summary {gtsummary}R Documentation

Summary table

Description

The tbl_summary() function calculates descriptive statistics for continuous, categorical, and dichotomous variables. Review the tbl_summary vignette for detailed examples.

Usage

tbl_summary(
  data,
  by = NULL,
  label = NULL,
  statistic = list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~
    "{n} ({p}%)"),
  digits = NULL,
  type = NULL,
  value = NULL,
  missing = c("ifany", "no", "always"),
  missing_text = "Unknown",
  missing_stat = "{N_miss}",
  sort = all_categorical(FALSE) ~ "alphanumeric",
  percent = c("column", "row", "cell"),
  include = everything()
)

Arguments

data

(data.frame)
A data frame.

by

(tidy-select)
A single column from data. Summary statistics will be stratified by this variable. Default is NULL.

label

(formula-list-selector)
Used to override default labels in summary table, e.g. list(age = "Age, years"). The default for each variable is the column label attribute, attr(., 'label'). If no label has been set, the column name is used.

statistic

(formula-list-selector)
Specifies summary statistics to display for each variable. The default is list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~ "{n} ({p}%)"). See below for details.

digits

(formula-list-selector)
Specifies how summary statistics are rounded. Values may be either integer(s) or function(s). If not specified, default formatting is assigned via assign_summary_digits(). See below for details.

type

(formula-list-selector)
Specifies the summary type. Accepted value are c("continuous", "continuous2", "categorical", "dichotomous"). If not specified, default type is assigned via assign_summary_type(). See below for details.

value

(formula-list-selector)
Specifies the level of a variable to display on a single row. The gtsummary type selectors, e.g. all_dichotomous(), cannot be used with this argument. Default is NULL. See below for details.

missing, missing_text, missing_stat

Arguments dictating how and if missing values are presented:

  • missing: must be one of c("ifany", "no", "always")

  • missing_text: string indicating text shown on missing row. Default is "Unknown"

  • missing_stat: statistic to show on missing row. Default is "{N_miss}". Possible values are N_miss, N_obs, N_nonmiss, p_miss, p_nonmiss.

sort

(formula-list-selector)
Specifies sorting to perform for categorical variables. Values must be one of c("alphanumeric", "frequency"). Default is all_categorical(FALSE) ~ "alphanumeric".

percent

(string)
Indicates the type of percentage to return. Must be one of c("column", "row", "cell"). Default is "column".

include

(tidy-select)
Variables to include in the summary table. Default is everything().

Value

a gtsummary table of class "tbl_summary"

A table of class c('tbl_summary', 'gtsummary')

statistic argument

The statistic argument specifies the statistics presented in the table. The input dictates the summary statistics presented in the table. For example, statistic = list(age ~ "{mean} ({sd})") would report the mean and standard deviation for age; statistic = list(all_continuous() ~ "{mean} ({sd})") would report the mean and standard deviation for all continuous variables.

The values are interpreted using glue::glue() syntax: a name that appears between curly brackets will be interpreted as a function name and the formatted result of that function will be placed in the table.

For categorical variables, the following statistics are available to display: {n} (frequency), {N} (denominator), {p} (percent).

For continuous variables, any univariate function may be used. The most commonly used functions are {median}, {mean}, {sd}, {min}, and {max}. Additionally, ⁠{p##}⁠ is available for percentiles, where ⁠##⁠ is an integer from 0 to 100. For example, p25: quantile(probs=0.25, type=2).

When the summary type is "continuous2", pass a vector of statistics. Each element of the vector will result in a separate row in the summary table.

For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are available to display.

digits argument

The digits argument specifies the the number of digits (or formatting function) statistics are rounded to.

The values passed can either be a single integer, a vector of integers, a function, or a list of functions. If a single integer or function is passed, it is recycled to the length of the number of statistics presented. For example, if the statistic is "{mean} ({sd})", it is equivalent to pass 1, c(1, 1), label_style_number(digits=1), and list(label_style_number(digits=1), label_style_number(digits=1)).

Named lists are also accepted to change the default formatting for a single statistic, e.g. list(sd = label_style_number(digits=1)).

type and value arguments

There are four summary types. Use the type argument to change the default summary types.

Author(s)

Daniel D. Sjoberg

See Also

See tbl_summary vignette for detailed tutorial

See table gallery for additional examples

Review list, formula, and selector syntax used throughout gtsummary

Examples

# Example 1 ----------------------------------
trial |>
  select(age, grade, response) |>
  tbl_summary()

# Example 2 ----------------------------------
trial |>
  select(age, grade, response, trt) |>
  tbl_summary(
    by = trt,
    label = list(age = "Patient Age"),
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    digits = list(age = c(0, 1))
  )

# Example 3 ----------------------------------
trial |>
  select(age, marker) |>
  tbl_summary(
    type = all_continuous() ~ "continuous2",
    statistic = all_continuous() ~ c("{median} ({p25}, {p75})", "{min}, {max}"),
    missing = "no"
  )

[Package gtsummary version 2.0.0 Index]