time_roll_sum {timeplyr} | R Documentation |
Fast time-based by-group rolling sum/mean - Currently experimental
Description
time_roll_sum
and time_roll_mean
are efficient
methods for calculating a rolling sum and mean respectively given
many groups and with respect to a date or datetime time index.
It is always aligned "right".
time_roll_window
splits x
into windows based on the index.
time_roll_window_size
returns the window sizes for all indices of x
.
time_roll_apply
is a generic function that applies any function
on a rolling basis with respect to a time index.
time_roll_growth_rate
can efficiently calculate by-group
rolling growth rates with respect to a date/datetime index.
Usage
time_roll_sum(
x,
window = Inf,
time = seq_along(x),
weights = NULL,
g = NULL,
partial = TRUE,
close_left_boundary = FALSE,
na.rm = TRUE,
time_type = getOption("timeplyr.time_type", "auto"),
roll_month = getOption("timeplyr.roll_month", "preday"),
roll_dst = getOption("timeplyr.roll_dst", "NA"),
...
)
time_roll_mean(
x,
window = Inf,
time = seq_along(x),
weights = NULL,
g = NULL,
partial = TRUE,
close_left_boundary = FALSE,
na.rm = TRUE,
time_type = getOption("timeplyr.time_type", "auto"),
roll_month = getOption("timeplyr.roll_month", "preday"),
roll_dst = getOption("timeplyr.roll_dst", "NA"),
...
)
time_roll_growth_rate(
x,
window = Inf,
time = seq_along(x),
time_step = NULL,
g = NULL,
partial = TRUE,
close_left_boundary = FALSE,
na.rm = TRUE,
time_type = getOption("timeplyr.time_type", "auto"),
roll_month = getOption("timeplyr.roll_month", "preday"),
roll_dst = getOption("timeplyr.roll_dst", "NA")
)
time_roll_window_size(
time,
window = Inf,
g = NULL,
partial = TRUE,
close_left_boundary = FALSE,
time_type = getOption("timeplyr.time_type", "auto"),
roll_month = getOption("timeplyr.roll_month", "preday"),
roll_dst = getOption("timeplyr.roll_dst", "NA")
)
time_roll_window(
x,
window = Inf,
time = seq_along(x),
g = NULL,
partial = TRUE,
close_left_boundary = FALSE,
time_type = getOption("timeplyr.time_type", "auto"),
roll_month = getOption("timeplyr.roll_month", "preday"),
roll_dst = getOption("timeplyr.roll_dst", "NA")
)
time_roll_apply(
x,
window = Inf,
fun,
time = seq_along(x),
g = NULL,
partial = TRUE,
unlist = FALSE,
close_left_boundary = FALSE,
time_type = getOption("timeplyr.time_type", "auto"),
roll_month = getOption("timeplyr.roll_month", "preday"),
roll_dst = getOption("timeplyr.roll_dst", "NA")
)
Arguments
x |
Numeric vector. |
window |
Time window size (Default is
|
time |
(Optional) time index. |
weights |
Importance weights. Must be the same length as x. Currently, no normalisation of weights occurs. |
g |
Grouping object passed directly to |
partial |
Should calculations be done using partial windows?
Default is |
close_left_boundary |
Should the left boundary be closed?
For example, if you specify |
na.rm |
Should missing values be removed for the calculation?
The default is |
time_type |
If "auto", |
roll_month |
Control how impossible dates are handled when
month or year arithmetic is involved.
Options are "preday", "boundary", "postday", "full" and "NA".
See |
roll_dst |
See |
... |
Additional arguments passed to |
time_step |
An optional but important argument
that follows the same input rules as |
fun |
A function. |
unlist |
Should the output of |
Details
It is much faster if your data are already sorted such that
!is.unsorted(order(g, x))
is TRUE
.
Growth rates
For growth rates across time, one can use time_step
to incorporate
gaps in time into the calculation.
For example:
x <- c(10, 20)
t <- c(1, 10)
k <- Inf
time_roll_growth_rate(x, time = t, window = k)
= c(1, 2)
whereas
time_roll_growth_rate(x, time = t, window = k, time_step = 1)
= c(1, 1.08)
The first is a doubling from 10 to 20, whereas the second implies a growth of
8% for each time step from 1 to 10.
This allows us for example to calculate daily growth rates over the last x months,
even with missing days.
Value
A vector the same length as time
.
Examples
library(timeplyr)
library(lubridate)
library(dplyr)
time <- time_seq(today(), today() + weeks(3),
time_by = "3 days")
set.seed(99)
x <- sample.int(length(time))
roll_mean(x, window = 7)
roll_sum(x, window = 7)
time_roll_mean(x, window = ddays(7), time = time)
time_roll_sum(x, window = days(7), time = time)
# Alternatively and more verbosely
x_chunks <- time_roll_window(x, window = 7, time = time)
x_chunks
vapply(x_chunks, mean, 0)
# Interval (x - 3 x]
time_roll_sum(x, window = ddays(3), time = time)
# An example with an irregular time series
t <- today() + days(sort(sample(1:30, 20, TRUE)))
time_elapsed(t, days(1)) # See the irregular elapsed time
x <- rpois(length(t), 10)
tibble(x, t) %>%
mutate(sum = time_roll_sum(x, time = t, window = days(3))) %>%
time_ggplot(t, sum)
### Rolling mean example with many time series
# Sparse time with duplicates
index <- sort(sample(seq(now(), now() + dyears(3), by = "333 hours"),
250, TRUE))
x <- matrix(rnorm(length(index) * 10^3),
ncol = 10^3, nrow = length(index),
byrow = FALSE)
zoo_ts <- zoo::zoo(x, order.by = index)
# Normally you might attempt something like this
apply(x, 2,
function(x){
time_roll_mean(x, window = dmonths(1), time = index)
}
)
# Unfortunately this is too slow and inefficient
# Instead we can pivot it longer and code each series as a separate group
tbl <- ts_as_tibble(zoo_ts)
tbl %>%
mutate(monthly_mean = time_roll_mean(value, window = dmonths(1),
time = time, g = group))