flatten_date_intervals {EpiForsk} | R Documentation |
Flatten Date Intervals
Description
A tidyverse compatible function for simplifying time interval data
Usage
flatten_date_intervals(
data,
id,
in_date,
out_date,
status = NULL,
overlap_handling = "most_recent",
lag = 0
)
Arguments
data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
id |
< |
in_date |
< |
out_date |
< |
status |
< |
overlap_handling |
A character naming the method for handling overlaps
within an individuals time when
We currently don't have a method that lets the most recent status dominate and then potentially return to an older longer running status. If this is needed, please contact ADLS. |
lag |
A numeric, giving the number of days allowed between time intervals that should be collapsed into one. |
Details
This functions identifies overlapping time intervals within individual and
collapses them into distinct and disjoint intervals. When status
is
specified these intervals are both individual and status specific.
If lag
is specified then intervals must be more then lag
time units apart
to be considered distinct.
Value
A data frame with the id
, status
if specified and simplified in_date
and out_date
. The returned data is sorted by id
and in_date
.
Author(s)
ADLS, EMTH & ASO
Examples
### The flatten function works with both dates and numeric
dat <- data.frame(
ID = c(1, 1, 1, 2, 2, 3, 3, 4),
START = c(1, 2, 5, 3, 6, 2, 3, 6),
END = c(3, 3, 7, 4, 9, 3, 5, 8))
dat |> flatten_date_intervals(ID, START, END)
dat <- data.frame(
ID = c(1, 1, 1, 2, 2, 3, 3, 4, 4),
START = as.Date(c("2012-02-15", "2005-12-13", "2006-01-24",
"2002-03-14", "1997-02-27",
"2008-08-13", "1998-09-23",
"2005-01-12", "2007-05-10")),
END = as.Date(c("2012-06-03", "2007-02-05", "2006-08-22",
"2005-02-26", "1999-04-16",
"2008-08-22", "2015-01-29",
"2007-05-07", "2008-12-12")))
dat |> flatten_date_intervals(ID, START, END)
### Allow for a 5 days lag between
dat |> flatten_date_intervals(ID, START, END, lag = 5)
### Adding status information
dat <- data.frame(
ID = c(1, 1, 1, 2, 2, 3, 3, 4, 4),
START = as.Date(c("2012-02-15", "2005-12-13", "2006-01-24",
"2002-03-14", "1997-02-27",
"2008-08-13", "1998-09-23",
"2005-01-12", "2007-05-10")),
END = as.Date(c("2012-06-03", "2007-02-05", "2006-08-22",
"2005-02-26", "1999-04-16",
"2008-08-22", "2015-01-29",
"2007-05-07", "2008-12-12")),
REGION = c("H", "H", "N", "S", "S", "M", "N", "S", "S"))
# Note the difference between the the different overlap_handling methods
dat |> flatten_date_intervals(ID, START, END, REGION, "none")
dat |> flatten_date_intervals(ID, START, END, REGION, "first")
dat |> flatten_date_intervals(ID, START, END, REGION, "most_recent")