aggregate_data {iNZightTools} | R Documentation |
Aggregate data by categorical variables
Description
Summarizes non-categorical variables in a dataframe by grouping them based on specified categorical variables and returns the aggregated result along with the tidyverse code used to generate it.
Usage
aggregate_data(
data,
group_vars,
summaries,
vars = NULL,
names = NULL,
quantiles = c(0.25, 0.75)
)
aggregate_dt(
data,
dt,
dt_comp,
group_vars = NULL,
summaries,
vars = NULL,
names = NULL,
quantiles = c(0.25, 0.75)
)
Arguments
data |
A dataframe or survey design object to be aggregated. |
group_vars |
A character vector specifying the variables in |
summaries |
An unnamed character vector or named list of summary functions to calculate for each group. If unnamed, the vector elements should be names of variables in the dataset for which summary statistics need to be calculated. If named, the names should correspond to the summary functions (e.g., "mean", "sd", "iqr") to be applied to each variable. |
vars |
(Optional) A character vector specifying the names of variables
in the dataset for which summary statistics need to be calculated.
This argument is ignored if |
names |
(Optional) A character vector or named list providing name templates for the newly created variables. See details for more information. |
quantiles |
(Optional) A numeric vector specifying the desired quantiles (e.g., c(0.25, 0.5, 0.75)). See details for more information. |
dt |
A character string representing the name of the date-time variable in the dataset. |
dt_comp |
A character string specifying the component of the date-time to use for grouping. |
Details
The aggregate_data()
function accepts any R function that returns a
single-value summary (e.g., mean
, var
, sd
, sum
, IQR
). By default,
new variables are named {var}_{fun}
, where {var}
is the variable name
and {fun}
is the summary function used. The user can provide custom names
using the names
argument, either as a vector of the same length as vars
,
or as a named list where the names correspond to summary functions (e.g.,
"mean" or "sd").
The special summary "missing" can be included, which counts the number of
missing values in the variable. The default name for this summary is
{var}_missing
.
If quantiles
are requested, the function calculates the specified
quantiles (e.g., 25th, 50th, 75th percentiles), creating new variables for
each quantile. To customize the names of these variables, use {p}
as a
placeholder in the names
argument, where {p}
represents the quantile
value. For example, using names = "Q{p}_{var}"
will create variables like
"Q0.25_Sepal.Length" for the 25th percentile.
Value
An aggregated dataframe containing the summary statistics for each group, along with the tidyverse code used for the aggregation.
Functions
-
aggregate_dt()
: Aggregate data by dates and times
Author(s)
Tom Elliott, Owen Jin, Zhaoming Su
Zhaoming Su
See Also
Examples
aggregated <-
aggregate_data(iris,
group_vars = c("Species"),
summaries = c("mean", "sd", "iqr")
)
code(aggregated)
head(aggregated)