R: Flatten Date Intervals

flatten_date_intervals {EpiForsk}

R Documentation

Flatten Date Intervals

Description

A tidyverse compatible function for simplifying time interval data

Usage

flatten_date_intervals(
  data,
  id,
  in_date,
  out_date,
  status = NULL,
  overlap_handling = "most_recent",
  lag = 0
)

Arguments

`data`	A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).
`id`	<`tidy-select`> One or more unquoted expression naming the id variables in data.
`in_date`	<`data-masking`> One unquoted expressions naming the start date variable in data.
`out_date`	<`data-masking`> One unquoted expression naming the end date variable in data.
`status`	<`tidy-select`> One or more unquoted expressions naming a status variable in data, such as region or hospitalization reason.
`overlap_handling`	A character naming the method for handling overlaps within an individuals time when `status` has been specified. "none": No special handling of the overlapping time intervals within person is done. "first": The `status` mentioned first, that is, has the smallest `in_date`, dominates. "most_recent" (default): The most recent `status`, that is, the one with the largest `in_date`, dominates. When the most recent `status` is fully contained within an older (and different) `status` then the `out_date` associated with the most recent `in_date` is kept, but the remaining time from the older `status` is removed. See examples below. We currently don't have a method that lets the most recent status dominate and then potentially return to an older longer running status. If this is needed, please contact ADLS.
`lag`	A numeric, giving the number of days allowed between time intervals that should be collapsed into one.

Details

This functions identifies overlapping time intervals within individual and collapses them into distinct and disjoint intervals. When status is specified these intervals are both individual and status specific.

If lag is specified then intervals must be more then lag time units apart to be considered distinct.

Value

A data frame with the id, status if specified and simplified in_date and out_date. The returned data is sorted by id and in_date.

Author(s)

ADLS, EMTH & ASO

Examples


### The flatten function works with both dates and numeric

dat <- data.frame(
   ID    = c(1, 1, 1, 2, 2, 3, 3, 4),
   START = c(1, 2, 5, 3, 6, 2, 3, 6),
   END   = c(3, 3, 7, 4, 9, 3, 5, 8))
dat |> flatten_date_intervals(ID, START, END)

dat <- data.frame(
   ID    = c(1, 1, 1, 2, 2, 3, 3, 4, 4),
   START = as.Date(c("2012-02-15", "2005-12-13", "2006-01-24",
                     "2002-03-14", "1997-02-27",
                     "2008-08-13", "1998-09-23",
                     "2005-01-12", "2007-05-10")),
   END   = as.Date(c("2012-06-03", "2007-02-05", "2006-08-22",
                     "2005-02-26", "1999-04-16",
                     "2008-08-22", "2015-01-29",
                     "2007-05-07", "2008-12-12")))
dat |> flatten_date_intervals(ID, START, END)



###  Allow for a 5 days lag between

dat |> flatten_date_intervals(ID, START, END, lag = 5)



### Adding status information

dat <- data.frame(
   ID     = c(1, 1, 1, 2, 2, 3, 3, 4, 4),
   START  = as.Date(c("2012-02-15", "2005-12-13", "2006-01-24",
                      "2002-03-14", "1997-02-27",
                      "2008-08-13", "1998-09-23",
                      "2005-01-12", "2007-05-10")),
   END    = as.Date(c("2012-06-03", "2007-02-05", "2006-08-22",
                      "2005-02-26", "1999-04-16",
                      "2008-08-22", "2015-01-29",
                     "2007-05-07", "2008-12-12")),
   REGION = c("H", "H", "N", "S", "S", "M", "N", "S", "S"))

# Note the difference between the the different overlap_handling methods
dat |> flatten_date_intervals(ID, START, END, REGION, "none")
dat |> flatten_date_intervals(ID, START, END, REGION, "first")
dat |> flatten_date_intervals(ID, START, END, REGION, "most_recent")

[Package EpiForsk version 0.1.1 Index]