aggregate_Date {LightLogR} | R Documentation |
Aggregate dates to a single day
Description
Condenses a dataset
by aggregating the data to a single day per group, with
a resolution of choice unit
. aggregate_Date()
is opinionated in the sense
that it sets default handlers for each data type of numeric
, character
,
logical
, and factor
. These can be overwritten by the user. Columns that
do not fall into one of these categories need to be handled individually by
the user (...
argument) or will be removed during aggregation. If no unit
is specified the data will simply be aggregated to the most common interval
(dominant.epoch
) in every group. aggregate_Date()
is especially useful
for summary plots that show an average day.
Usage
aggregate_Date(
dataset,
Datetime.colname = Datetime,
unit = "none",
type = c("round", "floor", "ceiling"),
date.handler = stats::median,
numeric.handler = mean,
character.handler = function(x) names(which.max(table(x, useNA = "ifany"))),
logical.handler = function(x) mean(x) >= 0.5,
factor.handler = function(x) factor(names(which.max(table(x, useNA = "ifany")))),
...
)
Arguments
dataset |
A light logger dataset. Expects a |
Datetime.colname |
column name that contains the datetime. Defaults to
|
unit |
Unit of binning. See |
type |
One of |
date.handler |
A function that calculates the aggregated day for each
group. By default, this is set to |
numeric.handler , character.handler , logical.handler , factor.handler |
functions that handle the respective data types. The default handlers
calculate the |
... |
arguments given over to |
Details
aggregate_Date()
splits the Datetime
column into a Date.data
and a Time.data
column. It will create subgroups for each Time.data
present in a group and aggregate each group into a single day, then remove
the sub grouping.
Use the ...
to create summary statistics for each group, e.g. maximum or
minimum values for each time point group.
Performing aggregate_Datetime()
with any unit
and then
aggregate_Date()
with a unit
of "none"
is equivalent to just using
aggregate_Date()
with that unit
directly (provided the other arguments
are set the same between the functions). Disentangling the two functions
can be useful to split the computational cost for very small instances of
unit
in large datasets. It can also be useful to apply different handlers
when aggregating data to the desired unit
of time, before further
aggregation to a single day, as these handlers as well as ...
are used
twice if the unit
is not set to "none"
.
Value
A tibble
with aggregated Datetime
data, at maximum one day per
group. If the handler arguments capture all column types, the number of
columns will be the same as in the input dataset
.
Examples
library(ggplot2)
#gg_days without aggregation
sample.data.environment %>%
gg_days()
#with daily aggregation
sample.data.environment %>%
aggregate_Date() %>%
gg_days()
#with daily aggregation and a different time aggregation
sample.data.environment %>%
aggregate_Date(unit = "15 mins", type = "floor") %>%
gg_days()
#adding further summary statistics about the range of MEDI
sample.data.environment %>%
aggregate_Date(unit = "15 mins", type = "floor",
MEDI_max = max(MEDI),
MEDI_min = min(MEDI)) %>%
gg_days() +
geom_ribbon(aes(ymin = MEDI_min, ymax = MEDI_max), alpha = 0.5)