survmean {popEpi} | R Documentation |
Compute Mean Survival Times Using Extrapolation
Description
Computes mean survival times based on survival estimation up to
a point in follow-up time (e.g. 10 years),
after which survival is extrapolated
using an appropriate hazard data file (pophaz
) to yield the "full"
survival curve. The area under the full survival curve is the mean survival.
Usage
survmean(
formula,
data,
adjust = NULL,
weights = NULL,
breaks = NULL,
pophaz = NULL,
e1.breaks = NULL,
e1.pophaz = pophaz,
r = "auto",
surv.method = "hazard",
subset = NULL,
verbose = FALSE
)
Arguments
formula |
a |
data |
a |
adjust |
variables to adjust estimates by, e.g. |
weights |
weights to use to adjust mean survival times. See the
dedicated help page for more details on
weighting. |
breaks |
a list of breaks defining the time window to compute
observed survival in, and the intervals used in estimation. E.g.
|
pophaz |
a data set of population hazards passed to
|
e1.breaks |
|
e1.pophaz |
Same as |
r |
either a numeric multiplier such as |
surv.method |
passed to |
subset |
a logical condition; e.g. |
verbose |
|
Details
Basics
survmean
computes mean survival times. For median survival times
(i.e. where 50 % of subjects have died or met some other event)
use survtab
.
The mean survival time is simply the area under the survival curve. However, since full follow-up rarely happens, the observed survival curves are extrapolated using expected survival: E.g. one might compute observed survival till up to 10 years and extrapolate beyond that (till e.g. 50 years) to yield an educated guess on the full observed survival curve.
The area is computed by trapezoidal integration of the area under the curve. This function also computes the "full" expected survival curve from T = 0 till e.g. T = 50 depending on supplied arguments. The expected mean survival time is the area under the mean expected survival curve. This function returns the mean expected survival time to be compared with the mean survival time and for computing years of potential life lost (YPLL).
Results can be formed by strata and adjusted for e.g. age by using
the formula
argument as in survtab
. See also Examples.
Extrapolation tweaks
Argument r
controls the relative survival ratio (RSR) assumed to
persist beyond the time window where observed survival is computed
(defined by argument breaks
; e.g. up to FUT = 10
).
The RSR is simply RSR_i = p_oi / p_ei
for a time interval i
,
i.e. the observed divided by the expected
(conditional, not cumulative) probability of surviving from the beginning of
a time interval till its end. The cumulative product of RSR_i
over time is the (cumulative) relative survival curve.
If r
is numeric, e.g. r = 0.995
, that RSR level is assumed
to persist beyond the observed survival curve.
Numeric r
should be > 0
and expressed at the annual level
when using fractional years as the scale of the time variables.
E.g. if RSR is known to be 0.95
at the month level, then the
annualized RSR is 0.95^12
. This enables correct usage of the RSR
with survival intervals of varying lengths. When using day-level time
variables (such as Dates
; see as.Date
), numeric r
should be expressed at the day level, etc.
If r = "auto"
or r = "auto1"
, this function computes
RSR estimates internally and automatically uses the RSR_i
in the last survival interval in each stratum (and adjusting group)
and assumes that to persist beyond the observed survival curve.
Automatic determination of r
is a good starting point,
but in situations where the RSR estimate is uncertain it may produce poor
results. Using "autoX"
such as "auto6"
causes survmean
to use the mean of the estimated RSRs in the last X survival intervals,
which may be more stable.
Automatic determination will not use values >1
but set them to 1.
Visual inspection of the produced curves is always recommended: see
Examples.
One may also tweak the accuracy and length of extrapolation and
expected survival curve computation by using
e1.breaks
. By default this is whatever was supplied to breaks
for the survival time scale, to which
c(seq(1/12, 1, 1/12), seq(1.2, 1.8, 0.2), 2:19, seq(20, 50, 5))
is added after the maximum value, e.g. with breaks = list(FUT = 0:10)
we have
..., 10+1/12, ..., 11, 11.2, ..., 2, 3, ..., 19, 20, 25, ... 50
as the e1.breaks
. Supplying e1.breaks
manually requires
the breaks over time survival time scale supplied to argument breaks
to be reiterated in e1.breaks
; see Examples. NOTE: the
default extrapolation breaks assume the time scales in the data to be
expressed as fractional years, meaning this will work extremely poorly
when using e.g. day-level time scales (such as Date
variables).
Set the extrapolation breaks manually in such cases.
Value
Returns a data.frame
or data.table
(depending on
getOptions("popEpi.datatable")
; see ?popEpi
) containing the
following columns:
-
est
: The estimated mean survival time -
exp
: The computed expected survival time -
obs
: Counts of subjects in data -
YPLL
: Years of Potential Life Lost, computed as ((exp - est) * obs
) — though your time data may be in e.g. days, this column will have the same name regardless. The returned data also has columns named according to the variables supplied to the right-hand-side of the formula.
Author(s)
Joonas Miettinen
See Also
Other survmean functions:
Surv()
,
lines.survmean()
,
plot.survmean()
Other main functions:
Surv()
,
rate()
,
relpois()
,
relpois_ag()
,
sir()
,
sirspline()
,
survtab()
,
survtab_ag()
Examples
library(Epi)
## take 500 subjects randomly for demonstration
data(sire)
sire <- sire[sire$dg_date < sire$ex_date, ]
set.seed(1L)
sire <- sire[sample(x = nrow(sire), size = 500),]
## NOTE: recommended to use factor status variable
x <- Lexis(entry = list(FUT = 0, AGE = dg_age, CAL = get.yrs(dg_date)),
exit = list(CAL = get.yrs(ex_date)),
data = sire,
exit.status = factor(status, levels = 0:2,
labels = c("alive", "canD", "othD")),
merge = TRUE)
## phony variable
set.seed(1L)
x$group <- rbinom(nrow(x), 1, 0.5)
## age group
x$agegr <- cut(x$dg_age, c(0,45,60,Inf), right=FALSE)
## population hazards data set
pm <- data.frame(popEpi::popmort)
names(pm) <- c("sex", "CAL", "AGE", "haz")
## breaks to define observed survival estimation
BL <- list(FUT = seq(0, 10, 1/12))
## crude mean survival
sm1 <- survmean(Surv(FUT, lex.Xst != "alive") ~ 1,
pophaz = pm, data = x, weights = NULL,
breaks = BL)
sm1 <- survmean(FUT ~ 1,
pophaz = pm, data = x, weights = NULL,
breaks = BL)
## mean survival by group
sm2 <- survmean(FUT ~ group,
pophaz = pm, data = x, weights = NULL,
breaks = BL)
## ... and adjusted for age using internal weights (counts of subjects)
## note: need also longer extrapolation here so that all curves
## converge to zero in the end.
eBL <- list(FUT = c(BL$FUT, 11:75))
sm3 <- survmean(FUT ~ group + adjust(agegr),
pophaz = pm, data = x, weights = "internal",
breaks = BL, e1.breaks = eBL)
## visual inspection of how realistic extrapolation is for each stratum;
## solid lines are observed + extrapolated survivals;
## dashed lines are expected survivals
plot(sm1)
## plotting object with both stratification and standardization
## plots curves for each strata-std.group combination
plot(sm3)
## for finer control of plotting these curves, you may extract
## from the survmean object using e.g.
attributes(sm3)$survmean.meta$curves
#### using Dates
x <- Lexis(entry = list(FUT = 0L, AGE = dg_date-bi_date, CAL = dg_date),
exit = list(CAL = ex_date),
data = sire[sire$dg_date < sire$ex_date, ],
exit.status = factor(status, levels = 0:2,
labels = c("alive", "canD", "othD")),
merge = TRUE)
## phony group variable
set.seed(1L)
x$group <- rbinom(nrow(x), 1, 0.5)
## NOTE: population hazard should be reported at the same scale
## as time variables in your Lexis data.
data(popmort, package = "popEpi")
pm <- data.frame(popmort)
names(pm) <- c("sex", "CAL", "AGE", "haz")
## from year to day level
pm$haz <- pm$haz/365.25
pm$CAL <- as.Date(paste0(pm$CAL, "-01-01"))
pm$AGE <- pm$AGE*365.25
BL <- list(FUT = seq(0, 8, 1/12)*365.25)
eBL <- list(FUT = c(BL$FUT, c(8.25,8.5,9:60)*365.25))
smd <- survmean(FUT ~ group, data = x,
pophaz = pm, verbose = TRUE, r = "auto5",
breaks = BL, e1.breaks = eBL)
plot(smd)