aggre {popEpi} | R Documentation |
Aggregation of split Lexis
data
Description
Aggregates a split Lexis
object by given variables
and / or expressions into a long-format table of person-years and
transitions / end-points. Automatic aggregation over time scales
by which data has been split if the respective time scales are mentioned
in the aggregation argument to e.g. intervals of calendar time, follow-up time
and/or age.
Usage
aggre(
lex,
by = NULL,
type = c("unique", "full"),
sum.values = NULL,
subset = NULL,
verbose = FALSE
)
Arguments
lex |
a |
by |
variables to tabulate (aggregate) by.
Flexible input, typically e.g.
|
type |
determines output levels to which data is aggregated varying
from returning only rows with |
sum.values |
optional: additional variables to sum by argument
|
subset |
a logical condition to subset by before computations;
e.g. |
verbose |
|
Details
Basics
aggre
is intended for aggregation of split Lexis
data only.
See Lexis
for forming Lexis
objects by hand
and e.g. splitLexis
, splitLexisDT
, and
splitMulti
for splitting the data. lexpand
may be used for simple data sets to do both steps as well as aggregation
in the same function call.
Here aggregation refers to computing person-years and the appropriate events (state transitions and end points in status) for the subjects in the data. Hence, it computes e.g. deaths (end-point and state transition) and censorings (end-point) as well as events in a multi-state setting (state transitions).
The result is a long-format data.frame
or data.table
(depending on options("popEpi.datatable")
; see ?popEpi
)
with the columns pyrs
and the appropriate transitions named as
fromXtoY
, e.g. from0to0
and from0to1
depending
on the values of lex.Cst
and lex.Xst
.
The by argument
The by
argument determines the length of the table, i.e.
the combinations of variables to which data is aggregated.
by
is relatively flexible, as it can be supplied as
a character string vector, e.g.
c("sex", "area")
, naming variables existing inlex
an expression, e.g.
factor(sex, 0:1, c("m", "f"))
using any variable found inlex
a list (fully or partially named) of expressions, e.g.
list(gender = factor(sex, 0:1, c("m", "f"), area)
Note that expressions effectively allow a variable to be supplied simply as
e.g. by = sex
(as a symbol/name in R lingo).
The data is then aggregated to the levels of the given variables
or expression(s). Variables defined to be time scales in the supplied
Lexis
are processed in a special way: If any are mentioned in the
by
argument, intervals of them are formed based on the breaks
used to split the data: e.g. if age
was split using the breaks
c(0, 50, Inf)
, mentioning age
in by
leads to
creating the age
intervals [0, 50)
and [50, Inf)
and aggregating to them. The intervals are identified in the output
as the lower bounds of the appropriate intervals.
The order of multiple time scales mentioned in by
matters,
as the last mentioned time scale is assumed to be a survival time scale
for when computing event counts. E.g. when the data is split by the breaks
list(FUT = 0:5, CAL = c(2008,2010))
, time lines cut short at
CAL = 2010
are considered to be censored, but time lines cut short at
FUT = 5
are not. See Return.
Aggregation types (styles)
It is almost always enough to aggregate the data to variable levels
that are actually represented in the data
(default aggre = "unique"
; alias "non-empty"
).
For certain uses it may be useful
to have also "empty" levels represented (resulting in some rows in output
with zero person-years and events); in these cases supplying
aggre = "full"
(alias "cartesian"
) causes aggre
to determine the Cartesian product of all the levels of the supplied
by
variables or expressions and aggregate to them. As an example
of a Cartesian product, try
merge(1:2, 1:5)
.
Value
A long data.frame
or data.table
of aggregated person-years
(pyrs
), numbers of subjects at risk (at.risk
), and events
formatted fromXtoY
, where X
and X
are states
transitioning from and to or states at the end of each lex.id
's
follow-up (implying X
= Y
). Subjects at risk are computed
in the beginning of an interval defined by any Lexis time scales and
mentioned in by
, but events occur at any point within an interval.
When the data has been split along multiple time scales, the last
time scale mentioned in by
is considered to be the survival time
scale with regard to computing events. Time lines cut short by the
extrema of non-survival-time-scales are considered to be censored
("transitions" from the current state to the current state).
Author(s)
Joonas Miettinen
See Also
aggregate
for a similar base R solution,
and ltable
for a data.table
based aggregator. Neither
are directly applicable to split Lexis
data.
Other aggregation functions:
as.aggre()
,
lexpand()
,
setaggre()
,
summary.aggre()
Examples
## form a Lexis object
library(Epi)
data(sibr)
x <- sibr[1:10,]
x[1:5,]$sex <- 0 ## pretend some are male
x <- Lexis(data = x,
entry = list(AGE = dg_age, CAL = get.yrs(dg_date)),
exit = list(CAL = get.yrs(ex_date)),
entry.status=0, exit.status = status)
x <- splitMulti(x, breaks = list(CAL = seq(1993, 2013, 5),
AGE = seq(0, 100, 50)))
## these produce the same results (with differing ways of determining aggre)
a1 <- aggre(x, by = list(gender = factor(sex, 0:1, c("m", "f")),
agegroup = AGE, period = CAL))
a2 <- aggre(x, by = c("sex", "AGE", "CAL"))
a3 <- aggre(x, by = list(sex, agegroup = AGE, CAL))
## returning also empty levels
a4 <- aggre(x, by = c("sex", "AGE", "CAL"), type = "full")
## computing also expected numbers of cases
x <- lexpand(sibr[1:10,], birth = bi_date, entry = dg_date,
exit = ex_date, status = status %in% 1:2,
pophaz = popmort, fot = 0:5, age = c(0, 50, 100))
x$d.exp <- with(x, lex.dur*pop.haz)
## these produce the same result
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = list(d.exp))
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = "d.exp")
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = d.exp)
## same result here with custom name
a5 <- aggre(x, by = c("sex", "age", "fot"),
sum.values = list(expCases = d.exp))
## computing pohar-perme weighted figures
x$d.exp.pp <- with(x, lex.dur*pop.haz*pp)
a6 <- aggre(x, by = c("sex", "age", "fot"),
sum.values = c("d.exp", "d.exp.pp"))
## or equivalently e.g. sum.values = list(expCases = d.exp, expCases.p = d.exp.pp).