| aggre {popEpi} | R Documentation |
Aggregation of split Lexis data
Description
Aggregates a split Lexis object by given variables
and / or expressions into a long-format table of person-years and
transitions / end-points. Automatic aggregation over time scales
by which data has been split if the respective time scales are mentioned
in the aggregation argument to e.g. intervals of calendar time, follow-up time
and/or age.
Usage
aggre(
lex,
by = NULL,
type = c("unique", "full"),
sum.values = NULL,
subset = NULL,
verbose = FALSE
)
Arguments
lex |
a |
by |
variables to tabulate (aggregate) by.
Flexible input, typically e.g.
|
type |
determines output levels to which data is aggregated varying
from returning only rows with |
sum.values |
optional: additional variables to sum by argument
|
subset |
a logical condition to subset by before computations;
e.g. |
verbose |
|
Details
Basics
aggre is intended for aggregation of split Lexis data only.
See Lexis for forming Lexis objects by hand
and e.g. splitLexis, splitLexisDT, and
splitMulti for splitting the data. lexpand
may be used for simple data sets to do both steps as well as aggregation
in the same function call.
Here aggregation refers to computing person-years and the appropriate events (state transitions and end points in status) for the subjects in the data. Hence, it computes e.g. deaths (end-point and state transition) and censorings (end-point) as well as events in a multi-state setting (state transitions).
The result is a long-format data.frame or data.table
(depending on options("popEpi.datatable"); see ?popEpi)
with the columns pyrs and the appropriate transitions named as
fromXtoY, e.g. from0to0 and from0to1 depending
on the values of lex.Cst and lex.Xst.
The by argument
The by argument determines the length of the table, i.e.
the combinations of variables to which data is aggregated.
by is relatively flexible, as it can be supplied as
a character string vector, e.g.
c("sex", "area"), naming variables existing inlexan expression, e.g.
factor(sex, 0:1, c("m", "f"))using any variable found inlexa list (fully or partially named) of expressions, e.g.
list(gender = factor(sex, 0:1, c("m", "f"), area)
Note that expressions effectively allow a variable to be supplied simply as
e.g. by = sex (as a symbol/name in R lingo).
The data is then aggregated to the levels of the given variables
or expression(s). Variables defined to be time scales in the supplied
Lexis are processed in a special way: If any are mentioned in the
by argument, intervals of them are formed based on the breaks
used to split the data: e.g. if age was split using the breaks
c(0, 50, Inf), mentioning age in by leads to
creating the age intervals [0, 50) and [50, Inf)
and aggregating to them. The intervals are identified in the output
as the lower bounds of the appropriate intervals.
The order of multiple time scales mentioned in by matters,
as the last mentioned time scale is assumed to be a survival time scale
for when computing event counts. E.g. when the data is split by the breaks
list(FUT = 0:5, CAL = c(2008,2010)), time lines cut short at
CAL = 2010 are considered to be censored, but time lines cut short at
FUT = 5 are not. See Return.
Aggregation types (styles)
It is almost always enough to aggregate the data to variable levels
that are actually represented in the data
(default aggre = "unique"; alias "non-empty").
For certain uses it may be useful
to have also "empty" levels represented (resulting in some rows in output
with zero person-years and events); in these cases supplying
aggre = "full" (alias "cartesian") causes aggre
to determine the Cartesian product of all the levels of the supplied
by variables or expressions and aggregate to them. As an example
of a Cartesian product, try
merge(1:2, 1:5).
Value
A long data.frame or data.table of aggregated person-years
(pyrs), numbers of subjects at risk (at.risk), and events
formatted fromXtoY, where X and X are states
transitioning from and to or states at the end of each lex.id's
follow-up (implying X = Y). Subjects at risk are computed
in the beginning of an interval defined by any Lexis time scales and
mentioned in by, but events occur at any point within an interval.
When the data has been split along multiple time scales, the last
time scale mentioned in by is considered to be the survival time
scale with regard to computing events. Time lines cut short by the
extrema of non-survival-time-scales are considered to be censored
("transitions" from the current state to the current state).
Author(s)
Joonas Miettinen
See Also
aggregate for a similar base R solution,
and ltable for a data.table based aggregator. Neither
are directly applicable to split Lexis data.
Other aggregation functions:
as.aggre(),
lexpand(),
setaggre(),
summary.aggre()
Examples
## form a Lexis object
library(Epi)
data(sibr)
x <- sibr[1:10,]
x[1:5,]$sex <- 0 ## pretend some are male
x <- Lexis(data = x,
entry = list(AGE = dg_age, CAL = get.yrs(dg_date)),
exit = list(CAL = get.yrs(ex_date)),
entry.status=0, exit.status = status)
x <- splitMulti(x, breaks = list(CAL = seq(1993, 2013, 5),
AGE = seq(0, 100, 50)))
## these produce the same results (with differing ways of determining aggre)
a1 <- aggre(x, by = list(gender = factor(sex, 0:1, c("m", "f")),
agegroup = AGE, period = CAL))
a2 <- aggre(x, by = c("sex", "AGE", "CAL"))
a3 <- aggre(x, by = list(sex, agegroup = AGE, CAL))
## returning also empty levels
a4 <- aggre(x, by = c("sex", "AGE", "CAL"), type = "full")
## computing also expected numbers of cases
x <- lexpand(sibr[1:10,], birth = bi_date, entry = dg_date,
exit = ex_date, status = status %in% 1:2,
pophaz = popmort, fot = 0:5, age = c(0, 50, 100))
x$d.exp <- with(x, lex.dur*pop.haz)
## these produce the same result
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = list(d.exp))
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = "d.exp")
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = d.exp)
## same result here with custom name
a5 <- aggre(x, by = c("sex", "age", "fot"),
sum.values = list(expCases = d.exp))
## computing pohar-perme weighted figures
x$d.exp.pp <- with(x, lex.dur*pop.haz*pp)
a6 <- aggre(x, by = c("sex", "age", "fot"),
sum.values = c("d.exp", "d.exp.pp"))
## or equivalently e.g. sum.values = list(expCases = d.exp, expCases.p = d.exp.pp).