time_episodes {timeplyr} | R Documentation |
Episodic calculation of time-since-event data
Description
This function assigns episodes to events based on a pre-defined threshold of a chosen time unit.
Usage
time_episodes(
data,
time,
time_by = NULL,
window = 1,
roll_episode = TRUE,
switch_on_boundary = TRUE,
fill = 0,
.add = FALSE,
event = NULL,
time_type = getOption("timeplyr.time_type", "auto"),
.by = NULL
)
Arguments
data |
A data frame. |
time |
Date or datetime variable to use for the episode calculation.
Supply the variable using |
time_by |
Time units used to calculate episode flags.
If
|
window |
Single number defining the episode threshold.
When |
roll_episode |
Logical.
Should episodes be calculated using a rolling or fixed window?
If |
switch_on_boundary |
When an exact amount of time
(specified in |
fill |
Value to fill first time elapsed value. Only applicable when
|
.add |
Should episodic variables be added to the data? |
event |
(Optional) List that encodes which rows are events,
and which aren't.
By default |
time_type |
Time type, either "auto", "duration" or "period".
With larger data, it is recommended to use |
.by |
(Optional). A selection of columns to group by for this operation.
Columns are specified using |
Details
time_episodes()
calculates the time elapsed (rolling or fixed) between
successive events, and flags these events as episodes or not based on how much
time has passed.
An example of episodic analysis can include disease infections over time.
In this example, a positive test result represents an event and
a new infection represents a new episode.
It is assumed that after a pre-determined amount of time, a positive result represents a new episode of infection.
To perform simple time-since-event analysis, which means one
is not interested in episodes, simply use time_elapsed()
instead.
To find implicit missing gaps in time, set window
to 1
and
switch_on_boundary
to FALSE
. Any event classified as an
episode in this scenario is an event following a gap in time.
The data are always sorted before calculation and then sorted back to the input order.
4 Key variables will be calculated:
-
ep_id - An integer variable signifying which episode each event belongs to.
Non-events are assignedNA
.
ep_id
is an increasing integer starting at 1. In the infections scenario, 1 are positives within the first episode of infection, 2 are positives within the second episode of infection and so on. -
ep_id_new - An integer variable signifying the first instance of each new episode. This is an increasing integer where 0 signifies within-episode observations and >= 1 signifies the first instance of the respective episode.
-
t_elapsed - The time elapsed since the last event.
Whenroll_episode = FALSE
, this becomes the time elapsed since the first event of the current episode. Time units are specified in the by argument. -
ep_start - Start date/datetime of the episode.
data.table
and collapse
are used for speed and efficiency.
Value
A data.frame
in the same order as it was given.
See Also
Examples
library(timeplyr)
library(dplyr)
library(nycflights13)
library(lubridate)
library(ggplot2)
# Say we want to flag origin-destination pairs
# that haven't seen departures or arrivals for a week
events <- flights %>%
mutate(date = as_date(time_hour)) %>%
group_by(origin, dest) %>%
time_episodes(date, time_by = "week", window = 1)
# The pooled average time between flights of a specific origin and destination
# is ~ 5.2 hours
# This average is a weighted average of average time between events
# Weighted by the frequency of origin-destination groups (pairs)
# It can be calculated like so:
# flights %>%
# arrange(origin, dest, time_hour) %>%
# group_by(origin, dest) %>%
# mutate(time_diff = time_diff(lag(time_hour), time_hour, "hours")) %>%
# summarise(n = n(),
# mean = mean(time_diff, na.rm = TRUE)) %>%
# ungroup() %>%
# summarise(pooled_mean = weighted.mean(mean, n, na.rm = TRUE))
events
episodes <- events %>%
filter(ep_id_new > 1)
nrow(fdistinct(episodes, origin, dest)) # 55 origin-destinations
# As expected summer months saw the least number of
# dry-periods
episodes %>%
ungroup() %>%
time_by(ep_start, time_by = "week",
.name = "ep_start") %>%
count() %>%
ggplot(aes(x = ep_start, y = n)) +
geom_bar(stat = "identity")