splitMulti {popEpi}R Documentation

Split case-level observations

Description

Split a Lexis object along multiple time scales with speed and ease

Usage

splitMulti(
  data,
  breaks = NULL,
  ...,
  drop = TRUE,
  merge = TRUE,
  verbose = FALSE
)

Arguments

data

a Lexis object with event cases as rows

breaks

a list of named numeric vectors of breaks; see Details and Examples

...

alternate way of supplying breaks as named vectors; e.g. fot = 0:5 instead of breaks = list(fot = 0:5); if breaks is not NULL, breaks is used and any breaks passed through ... are NOT used; note also that due to partial matching of argument names in R, if you supply e.g. dat = my_breaks and you do not pass argument data explicitly (data = my_data), then R interprets this as data = my_breaks — so choose the names of your time scales wisely

drop

logical; if TRUE, drops all resulting rows after expansion that reside outside the time window defined by the given breaks

merge

logical; if TRUE, retains all variables from the original data - i.e. original variables are repeated for all the rows by original subject

verbose

logical; if TRUE, the function is chatty and returns some messages along the way

Details

splitMulti is in essence a data.table version of splitLexis or survSplit for splitting along multiple time scales. It requires a Lexis object as input.

The breaks must be a list of named vectors of the appropriate type. The breaks are fully explicit and left-inclusive and right exclusive, e.g. fot=c(0,5) forces the data to only include time between [0,5) for each original row (unless drop = FALSE). Use Inf or -Inf for open-ended intervals, e.g. per=c(1990,1995,Inf) creates the intervals [1990,1995), [1995, Inf).

Instead of specifying breaks, one may make use of the ... argument to pass breaks: e.g.

splitMulti(x, breaks = list(fot = 0:5))

is equivalent to

splitMulti(x, fot = 0:5).

Multiple breaks can be supplied in the same manner. However, if both breaks and ... are used, only the breaks in breaks are utilized within the function.

The Lexis time scale variables can be of any arbitrary format, e.g. Date, fractional years (see cal.yr) and get.yrs, or other. However, using date variables (from package date) are not recommended, as date variables are always stored as integers, whereas Date variables (see ?as.Date) are typically stored in double ("numeric") format. This allows for breaking days into fractions as well, when using e.g. hypothetical years of 365.25 days.

Value

A data.table or data.frame (depending on options("popEpi.datatable"); see ?popEpi) object expanded to accommodate split observations.

Author(s)

Joonas Miettinen

See Also

splitLexis, Lexis, survSplit

Other splitting functions: lexpand(), splitLexisDT()

Examples

#### let's prepare data for computing period method survivals
#### in case there are problems with dates, we first 
#### convert to fractional years.

library("Epi")
library("data.table")
data("sire", package = "popEpi")
x <- Lexis(data=sire[dg_date < ex_date, ], 
           entry = list(fot=0, per=get.yrs(dg_date), age=dg_age), 
           exit=list(per=get.yrs(ex_date)), exit.status=status)
x2 <- splitMulti(x, breaks = list(fot=seq(0, 5, by = 3/12), per=c(2008, 2013)))
# equivalently:
x2 <- splitMulti(x, fot=seq(0, 5, by = 3/12), per=c(2008, 2013))

## using dates; note: breaks must be expressed as dates or days!
x <- Lexis(data=sire[dg_date < ex_date, ], 
           entry = list(fot=0, per=dg_date, age=dg_date-bi_date), 
           exit=list(per=ex_date), exit.status=status)
BL <- list(fot = seq(0, 5, by = 3/12)*365.242199,
           per = as.Date(paste0(c(1980:2014),"-01-01")),
           age = c(0,45,85,Inf)*365.242199)
x2 <- splitMulti(x, breaks = BL, verbose=TRUE)


## multistate example (healty - sick - dead)
sire2 <- data.frame(sire)
sire2 <- sire2[sire2$dg_date < sire2$ex_date, ]

set.seed(1L) 
not_sick <- sample.int(nrow(sire2), 6000L, replace = FALSE)
sire2$dg_date[not_sick] <- NA
sire2$status[!is.na(sire2$dg_date) & sire2$status == 0] <- -1

sire2$status[sire2$status==2] <- 1
sire2$status <- factor(sire2$status, levels = c(0, -1, 1), 
                       labels = c("healthy", "sick", "dead"))
 
xm <- Lexis(data = sire2, 
            entry = list(fot=0, per=get.yrs(bi_date), age=0), 
            exit = list(per=get.yrs(ex_date)), exit.status=status)
xm2 <- cutLexis(xm, cut = get.yrs(xm$dg_date), 
                timescale = "per", 
                new.state = "sick")
xm2[xm2$lex.id == 6L, ]

xm2 <- splitMulti(xm2, breaks = list(fot = seq(0,150,25)))
xm2[xm2$lex.id == 6L, ]



[Package popEpi version 0.4.12 Index]