summarise {srvyr} | R Documentation |
Summarise multiple values to a single value.
Description
Summarise multiple values to a single value.
Arguments
.data |
tbl A |
... |
Name-value pairs of summarizing expressions, see details |
.groups |
Defaults to "drop_last" in srvyr meaning that the last group is peeled off, but if there are more groups they will be preserved. Other options are "drop", which drops all groups, "keep" which keeps all of them and "rowwise" which converts the object to a rowwise object (meaning calculations will be performed on each row). |
.unpack |
Whether to "unpack" named |
Details
Summarise for tbl_svy
objects accepts several specialized functions.
Each of the functions a variable (or two, in the case of
survey_ratio
), from the data.frame and default to providing the measure
and its standard error.
The argument vartype
can choose one or more measures of uncertainty,
se
for standard error, ci
for confidence interval, var
for variance, and cv
for coefficient of variation. level
specifies the level for the confidence interval.
The other arguments correspond to the analogous function arguments from the survey package.
The available functions from srvyr are:
survey_mean
-
Calculate the mean of a numeric variable or the proportion falling into
groups
for the entire population or bygroups
. Based onsvymean
andsvyciprop
.
.
survey_total
-
Calculate the survey total of the entire population or by
groups
. Based onsvytotal
. survey_prop
-
Calculate the proportion of the entire population or by
groups
. Based onsvyciprop
. survey_ratio
-
Calculate the ratio of 2 variables in the entire population or by
groups
. Based onsvyratio
. survey_quantile
&survey_median
-
Calculate quantiles in the entire population or by
groups
. Based onsvyquantile
. unweighted
-
Calculate an unweighted estimate as you would on a regular
tbl_df
. Based on dplyr'ssummarise
.
You can use expressions both in the ...
of summarize
and also
in the arguments to the summarizing functions. Though this is valid syntactically
it can also allow you to calculate incorrect results (for example if you multiply
the mean by 100, the standard error is also multiplied by 100, but the variance
is not).
Examples
data(api, package = "survey")
dstrata <- apistrat %>%
as_survey_design(strata = stype, weights = pw)
dstrata %>%
summarise(api99_mn = survey_mean(api99),
api00_mn = survey_mean(api00),
api_diff = survey_mean(api00 - api99))
dstrata_grp <- dstrata %>%
group_by(stype)
dstrata_grp %>%
summarise(api99_mn = survey_mean(api99),
api00_mn = survey_mean(api00),
api_diff = survey_mean(api00 - api99))
# `dplyr::across` can be used to programmatically summarize multiple columns
# See https://dplyr.tidyverse.org/articles/colwise.html for details
# A basic example of working on 2 columns at once and then calculating the total
# the mean
total_vars <- c("enroll", "api.stu")
dstrata %>%
summarize(across(c(all_of(total_vars)), survey_total))
# Expressions are allowed in summarize arguments & inside functions
# Here we can calculate binary variable on the fly and also multiply by 100 to
# get percentages
dstrata %>%
summarize(api99_over_700_pct = 100 * survey_mean(api99 > 700))
# But be careful, the variance doesn't scale the same way, so this is wrong!
dstrata %>%
summarize(api99_over_700_pct = 100 * survey_mean(api99 > 700, vartype = "var"))
# Wrong variance!