flatten_date_intervals {EpiForsk}R Documentation

Flatten Date Intervals

Description

A tidyverse compatible function for simplifying time interval data

Usage

flatten_date_intervals(
  data,
  id,
  in_date,
  out_date,
  status = NULL,
  overlap_handling = "most_recent",
  lag = 0
)

Arguments

data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

id

<tidy-select> One or more unquoted expression naming the id variables in data.

in_date

<data-masking> One unquoted expressions naming the start date variable in data.

out_date

<data-masking> One unquoted expression naming the end date variable in data.

status

<tidy-select> One or more unquoted expressions naming a status variable in data, such as region or hospitalization reason.

overlap_handling

A character naming the method for handling overlaps within an individuals time when status has been specified.

  • "none": No special handling of the overlapping time intervals within person is done.

  • "first": The status mentioned first, that is, has the smallest in_date, dominates.

  • "most_recent" (default): The most recent status, that is, the one with the largest in_date, dominates. When the most recent status is fully contained within an older (and different) status then the out_date associated with the most recent in_date is kept, but the remaining time from the older status is removed. See examples below.

We currently don't have a method that lets the most recent status dominate and then potentially return to an older longer running status. If this is needed, please contact ADLS.

lag

A numeric, giving the number of days allowed between time intervals that should be collapsed into one.

Details

This functions identifies overlapping time intervals within individual and collapses them into distinct and disjoint intervals. When status is specified these intervals are both individual and status specific.

If lag is specified then intervals must be more then lag time units apart to be considered distinct.

Value

A data frame with the id, status if specified and simplified in_date and out_date. The returned data is sorted by id and in_date.

Author(s)

ADLS, EMTH & ASO

Examples


### The flatten function works with both dates and numeric

dat <- data.frame(
   ID    = c(1, 1, 1, 2, 2, 3, 3, 4),
   START = c(1, 2, 5, 3, 6, 2, 3, 6),
   END   = c(3, 3, 7, 4, 9, 3, 5, 8))
dat |> flatten_date_intervals(ID, START, END)

dat <- data.frame(
   ID    = c(1, 1, 1, 2, 2, 3, 3, 4, 4),
   START = as.Date(c("2012-02-15", "2005-12-13", "2006-01-24",
                     "2002-03-14", "1997-02-27",
                     "2008-08-13", "1998-09-23",
                     "2005-01-12", "2007-05-10")),
   END   = as.Date(c("2012-06-03", "2007-02-05", "2006-08-22",
                     "2005-02-26", "1999-04-16",
                     "2008-08-22", "2015-01-29",
                     "2007-05-07", "2008-12-12")))
dat |> flatten_date_intervals(ID, START, END)



###  Allow for a 5 days lag between

dat |> flatten_date_intervals(ID, START, END, lag = 5)



### Adding status information

dat <- data.frame(
   ID     = c(1, 1, 1, 2, 2, 3, 3, 4, 4),
   START  = as.Date(c("2012-02-15", "2005-12-13", "2006-01-24",
                      "2002-03-14", "1997-02-27",
                      "2008-08-13", "1998-09-23",
                      "2005-01-12", "2007-05-10")),
   END    = as.Date(c("2012-06-03", "2007-02-05", "2006-08-22",
                      "2005-02-26", "1999-04-16",
                      "2008-08-22", "2015-01-29",
                     "2007-05-07", "2008-12-12")),
   REGION = c("H", "H", "N", "S", "S", "M", "N", "S", "S"))

# Note the difference between the the different overlap_handling methods
dat |> flatten_date_intervals(ID, START, END, REGION, "none")
dat |> flatten_date_intervals(ID, START, END, REGION, "first")
dat |> flatten_date_intervals(ID, START, END, REGION, "most_recent")


[Package EpiForsk version 0.1.1 Index]