get_time_delay {timeplyr}R Documentation

Get summary statistics of time delay

Description

The output is a list containing summary statistics of time delay between two date/datetime vectors. This can be especially useful in estimating reporting delay for example.

Usage

get_time_delay(
  data,
  origin,
  end,
  time_by = 1L,
  time_type = getOption("timeplyr.time_type", "auto"),
  min_delay = -Inf,
  max_delay = Inf,
  probs = c(0.25, 0.5, 0.75, 0.95),
  .by = NULL,
  include_plot = TRUE,
  x_scales = "fixed",
  bw = "sj",
  ...
)

Arguments

data

A data frame.

origin

Origin date variable.

end

End date variable.

time_by

Must be one of the three:

  • string, specifying either the unit or the number and unit, e.g time_by = "days" or time_by = "2 weeks"

  • named list of length one, the unit being the name, and the number the value of the list, e.g. list("days" = 7). For the vectorized time functions, you can supply multiple values, e.g. list("days" = 1:10).

  • Numeric vector. If time_by is a numeric vector and x is not a date/datetime, then arithmetic is used, e.g time_by = 1.

time_type

If "auto", periods are used for the time expansion when days, weeks, months or years are specified, and durations are used otherwise.

min_delay

The minimum acceptable delay, all delays less than this are removed before calculation. Default is min_delay = -Inf.

max_delay

The maximum acceptable delay, all delays greater than this are removed before calculation. Default is max_delay = Inf.

probs

Probabilities used in the quantile summary. Default is probs = c(0.25, 0.5, 0.75, 0.95).

.by

(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.

include_plot

Should a ggplot graph of delay distributions be included in the output?

x_scales

Option to control how the x-axis is displayed for multiple facets. Choices are "fixed" or "free_x".

bw

The smoothing bandwidth selector for the Kernel Density estimator. If numeric, the standard deviation of the smoothing kernel. If character, a rule to choose the bandwidth. See ?stats::bw.nrd for more details. The default has been set to "SJ" which implements the Sheather & Jones (1991) method, as recommended by the R team ?stats::density. This differs from the default implemented by stats::density() which uses Silverman's rule-of-thumb.

...

Further arguments to be passed on to ggplot2::geom_density().

Value

A list containing summary data, summary statistics and an optional ggplot.

Examples

library(timeplyr)
library(outbreaks)
library(dplyr)

ebola_linelist <- ebola_sim_clean$linelist

# Incubation period distribution

# 95% of individuals experienced an incubation period of <= 26 days
inc_distr_days <- ebola_linelist %>%
  get_time_delay(date_of_infection,
                 date_of_onset,
                 time_by = "days")
head(inc_distr_days$data)
inc_distr_days$unit
inc_distr_days$num
inc_distr_days$summary
head(inc_distr_days$delay) # ECDF and freq by delay
inc_distr_days$plot

# Can change bandwidth selector
inc_distr_days <- ebola_linelist %>%
  get_time_delay(date_of_infection,
                 date_of_onset,
                 time_by = "day",
                 bw = "nrd")
inc_distr_days$plot

# Can choose any time units
inc_distr_weeks <- ebola_linelist %>%
  get_time_delay(date_of_infection,
                 date_of_onset,
                 time_by = "weeks",
                 bw = "nrd")
inc_distr_weeks$plot


[Package timeplyr version 0.8.1 Index]