R: Fast time-based by-group rolling sum/mean

time_roll_sum {timeplyr}

R Documentation

Fast time-based by-group rolling sum/mean - Currently experimental

Description

time_roll_sum and time_roll_mean are efficient methods for calculating a rolling sum and mean respectively given many groups and with respect to a date or datetime time index.
It is always aligned "right".
time_roll_window splits x into windows based on the index.
time_roll_window_size returns the window sizes for all indices of x.
time_roll_apply is a generic function that applies any function on a rolling basis with respect to a time index.

time_roll_growth_rate can efficiently calculate by-group rolling growth rates with respect to a date/datetime index.

Usage

time_roll_sum(
  x,
  window = Inf,
  time = seq_along(x),
  weights = NULL,
  g = NULL,
  partial = TRUE,
  close_left_boundary = FALSE,
  na.rm = TRUE,
  time_type = getOption("timeplyr.time_type", "auto"),
  roll_month = getOption("timeplyr.roll_month", "preday"),
  roll_dst = getOption("timeplyr.roll_dst", "NA"),
  ...
)

time_roll_mean(
  x,
  window = Inf,
  time = seq_along(x),
  weights = NULL,
  g = NULL,
  partial = TRUE,
  close_left_boundary = FALSE,
  na.rm = TRUE,
  time_type = getOption("timeplyr.time_type", "auto"),
  roll_month = getOption("timeplyr.roll_month", "preday"),
  roll_dst = getOption("timeplyr.roll_dst", "NA"),
  ...
)

time_roll_growth_rate(
  x,
  window = Inf,
  time = seq_along(x),
  time_step = NULL,
  g = NULL,
  partial = TRUE,
  close_left_boundary = FALSE,
  na.rm = TRUE,
  time_type = getOption("timeplyr.time_type", "auto"),
  roll_month = getOption("timeplyr.roll_month", "preday"),
  roll_dst = getOption("timeplyr.roll_dst", "NA")
)

time_roll_window_size(
  time,
  window = Inf,
  g = NULL,
  partial = TRUE,
  close_left_boundary = FALSE,
  time_type = getOption("timeplyr.time_type", "auto"),
  roll_month = getOption("timeplyr.roll_month", "preday"),
  roll_dst = getOption("timeplyr.roll_dst", "NA")
)

time_roll_window(
  x,
  window = Inf,
  time = seq_along(x),
  g = NULL,
  partial = TRUE,
  close_left_boundary = FALSE,
  time_type = getOption("timeplyr.time_type", "auto"),
  roll_month = getOption("timeplyr.roll_month", "preday"),
  roll_dst = getOption("timeplyr.roll_dst", "NA")
)

time_roll_apply(
  x,
  window = Inf,
  fun,
  time = seq_along(x),
  g = NULL,
  partial = TRUE,
  unlist = FALSE,
  close_left_boundary = FALSE,
  time_type = getOption("timeplyr.time_type", "auto"),
  roll_month = getOption("timeplyr.roll_month", "preday"),
  roll_dst = getOption("timeplyr.roll_dst", "NA")
)

Arguments

`x`	Numeric vector.
`window`	Time window size (Default is `Inf`). Must be one of the following: string, e.g `window = "day"` or `window = "2 weeks"` lubridate duration or period object, e.g. `days(1)` or `ddays(1)`. named list of length one, e.g. `list("days" = 7)`. Numeric vector, e.g. `window = 7`.
`time`	(Optional) time index. Can be a `Date`, `POSIXt`, `numeric`, `integer`, `yearmon`, or `yearqtr` vector.
`weights`	Importance weights. Must be the same length as x. Currently, no normalisation of weights occurs.
`g`	Grouping object passed directly to `collapse::GRP()`. This can for example be a vector or data frame.
`partial`	Should calculations be done using partial windows? Default is `TRUE`.
`close_left_boundary`	Should the left boundary be closed? For example, if you specify `window = "day"` and `time = c(today(), today() + 1)`, a value of `FALSE` would result in the window vector `c(1, 1)` whereas a value of `TRUE` would result in the window vector `c(1, 2)`.
`na.rm`	Should missing values be removed for the calculation? The default is `TRUE`.
`time_type`	If "auto", `periods` are used for the time expansion when lubridate periods are specified or when days, weeks, months or years are specified, and `durations` are used otherwise.
`roll_month`	Control how impossible dates are handled when month or year arithmetic is involved. Options are "preday", "boundary", "postday", "full" and "NA". See `?timechange::time_add` for more details.
`roll_dst`	See `?timechange::time_add` for the full list of details.
`...`	Additional arguments passed to `data.table::frollmean` and `data.table::frollsum`.
`time_step`	An optional but important argument that follows the same input rules as `window`. It is currently only used only in `time_roll_growth_rate`. If this is supplied, the time differences across gaps in time are incorporated into the growth rate calculation. See details for more info.
`fun`	A function.
`unlist`	Should the output of `time_roll_apply` be unlisted with `unlist`? Default is `FALSE`.

Details

It is much faster if your data are already sorted such that !is.unsorted(order(g, x)) is TRUE.

Growth rates

For growth rates across time, one can use time_step to incorporate gaps in time into the calculation.

For example:
x <- c(10, 20)
t <- c(1, 10)
k <- Inf
time_roll_growth_rate(x, time = t, window = k) = c(1, 2) whereas
time_roll_growth_rate(x, time = t, window = k, time_step = 1) = c(1, 1.08)
The first is a doubling from 10 to 20, whereas the second implies a growth of 8% for each time step from 1 to 10.
This allows us for example to calculate daily growth rates over the last x months, even with missing days.

Value

A vector the same length as time.

Examples

library(timeplyr)
library(lubridate)
library(dplyr)

time <- time_seq(today(), today() + weeks(3),
                 time_by = "3 days")
set.seed(99)
x <- sample.int(length(time))

roll_mean(x, window = 7)
roll_sum(x, window = 7)

time_roll_mean(x, window = ddays(7), time = time)
time_roll_sum(x, window = days(7), time = time)

# Alternatively and more verbosely
x_chunks <- time_roll_window(x, window = 7, time = time)
x_chunks
vapply(x_chunks, mean, 0)

# Interval (x - 3 x]
time_roll_sum(x, window = ddays(3), time = time)

# An example with an irregular time series

t <- today() + days(sort(sample(1:30, 20, TRUE)))
time_elapsed(t, days(1)) # See the irregular elapsed time
x <- rpois(length(t), 10)

tibble(x, t) %>%
  mutate(sum = time_roll_sum(x, time = t, window = days(3))) %>%
  time_ggplot(t, sum)


### Rolling mean example with many time series

# Sparse time with duplicates
index <- sort(sample(seq(now(), now() + dyears(3), by = "333 hours"),
                     250, TRUE))
x <- matrix(rnorm(length(index) * 10^3),
            ncol = 10^3, nrow = length(index),
            byrow = FALSE)

zoo_ts <- zoo::zoo(x, order.by = index)

# Normally you might attempt something like this
apply(x, 2,
      function(x){
        time_roll_mean(x, window = dmonths(1), time = index)
      }
)
# Unfortunately this is too slow and inefficient


# Instead we can pivot it longer and code each series as a separate group
tbl <- ts_as_tibble(zoo_ts)

tbl %>%
  mutate(monthly_mean = time_roll_mean(value, window = dmonths(1),
                                       time = time, g = group))

[Package timeplyr version 0.8.1 Index]