tbl_svysummary {gtsummary} | R Documentation |
Create a table of summary statistics from a survey object
Description
The tbl_svysummary()
function calculates descriptive statistics for
continuous, categorical, and dichotomous variables taking into account survey weights and design.
Usage
tbl_svysummary(
data,
by = NULL,
label = NULL,
statistic = list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~
"{n} ({p}%)"),
digits = NULL,
type = NULL,
value = NULL,
missing = c("ifany", "no", "always"),
missing_text = "Unknown",
missing_stat = "{N_miss}",
sort = all_categorical(FALSE) ~ "alphanumeric",
percent = c("column", "row", "cell"),
include = everything()
)
Arguments
data |
( |
by |
( |
label |
( |
statistic |
( |
digits |
( |
type |
( |
value |
( |
missing , missing_text , missing_stat |
Arguments dictating how and if missing values are presented:
|
sort |
( |
percent |
( |
include |
( |
Value
A 'tbl_svysummary'
object
statistic argument
The statistic argument specifies the statistics presented in the table. The
input is a list of formulas that specify the statistics to report. For example,
statistic = list(age ~ "{mean} ({sd})")
would report the mean and
standard deviation for age; statistic = list(all_continuous() ~ "{mean} ({sd})")
would report the mean and standard deviation for all continuous variables.
A statistic name that appears between curly brackets
will be replaced with the numeric statistic (see glue::glue()
).
For categorical variables the following statistics are available to display.
-
{n}
frequency -
{N}
denominator, or cohort size -
{p}
proportion -
{p.std.error}
standard error of the sample proportion computed withsurvey::svymean()
-
{deff}
design effect of the sample proportion computed withsurvey::svymean()
-
{n_unweighted}
unweighted frequency -
{N_unweighted}
unweighted denominator -
{p_unweighted}
unweighted formatted percentage
For continuous variables the following statistics are available to display.
-
{median}
median -
{mean}
mean -
{mean.std.error}
standard error of the sample mean computed withsurvey::svymean()
-
{deff}
design effect of the sample mean computed withsurvey::svymean()
-
{sd}
standard deviation -
{var}
variance -
{min}
minimum -
{max}
maximum -
{p##}
any integer percentile, where##
is an integer from 0 to 100 -
{sum}
sum
Unlike tbl_summary()
, it is not possible to pass a custom function.
For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are available to display.
-
{N_obs}
total number of observations -
{N_miss}
number of missing observations -
{N_nonmiss}
number of non-missing observations -
{p_miss}
percentage of observations missing -
{p_nonmiss}
percentage of observations not missing -
{N_obs_unweighted}
unweighted total number of observations -
{N_miss_unweighted}
unweighted number of missing observations -
{N_nonmiss_unweighted}
unweighted number of non-missing observations -
{p_miss_unweighted}
unweighted percentage of observations missing -
{p_nonmiss_unweighted}
unweighted percentage of observations not missing
Note that for categorical variables, {N_obs}
, {N_miss}
and {N_nonmiss}
refer
to the total number, number missing and number non missing observations
in the denominator, not at each level of the categorical variable.
type and value arguments
There are four summary types. Use the type
argument to change the default summary types.
-
"continuous"
summaries are shown on a single row. Most numeric variables default to summary type continuous. -
"continuous2"
summaries are shown on 2 or more rows -
"categorical"
multi-line summaries of nominal data. Character variables, factor variables, and numeric variables with fewer than 10 unique levels default to type categorical. To change a numeric variable to continuous that defaulted to categorical, usetype = list(varname ~ "continuous")
-
"dichotomous"
categorical variables that are displayed on a single row, rather than one row per level of the variable. Variables coded asTRUE
/FALSE
,0
/1
, oryes
/no
are assumed to be dichotomous, and theTRUE
,1
, andyes
rows are displayed. Otherwise, the value to display must be specified in thevalue
argument, e.g.value = list(varname ~ "level to show")
Author(s)
Joseph Larmarange
Examples
# Example 1 ----------------------------------
survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) |>
tbl_svysummary(by = Survived, percent = "row", include = c(Class, Age))
# Example 2 ----------------------------------
# A dataset with a complex design
data(api, package = "survey")
survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) |>
tbl_svysummary(by = "both", include = c(api00, stype)) |>
modify_spanning_header(all_stat_cols() ~ "**Survived**")