R: Estimate Survival Time Functions

survtab_ag {popEpi}

R Documentation

Estimate Survival Time Functions

Description

This function estimates survival time functions: survival, relative/net survival, and crude/absolute risk functions (CIF).

Usage

survtab_ag(
  formula = NULL,
  data,
  adjust = NULL,
  weights = NULL,
  surv.breaks = NULL,
  n = "at.risk",
  d = "from0to1",
  n.cens = "from0to0",
  pyrs = "pyrs",
  d.exp = "d.exp",
  n.pp = NULL,
  d.pp = "d.pp",
  d.pp.2 = "d.pp.2",
  n.cens.pp = "n.cens.pp",
  pyrs.pp = "pyrs.pp",
  d.exp.pp = "d.exp.pp",
  surv.type = "surv.rel",
  surv.method = "hazard",
  relsurv.method = "e2",
  subset = NULL,
  conf.level = 0.95,
  conf.type = "log-log",
  verbose = FALSE
)

Arguments

`formula`	a `formula`; the response must be the time scale to compute survival time function estimates over, e.g. `fot ~ sex`. Variables on the right-hand side of the formula separated by `+` are considered stratifying variables, for which estimates are computed separately. May contain usage of `adjust()` — see Details and Examples.
`data`	since popEpi 0.4.0, a `data.frame` containing variables used in `formula` and other arguments. `aggre` objects are recommended as they contain information on any time scales and are therefore safer; for creating `aggre` objects see `as.aggre` when your data is already aggregated and `aggre` for aggregating split `Lexis` objects.
`adjust`	can be used as an alternative to passing variables to argument `formula` within a call to `adjust()`; e.g. `adjust = "agegr"`. Flexible input.
`weights`	typically a list of weights or a `character` string specifying an age group standardization scheme; see the dedicated help page and examples. NOTE: `weights = "internal"` is based on the counts of persons in follow-up at the start of follow-up (typically T = 0)
`surv.breaks`	a vector of breaks on the survival time scale. Optional if `data` is an `aggre` object and mandatory otherwise. Must define each intended interval; e.g. `surv.breaks = 0:5` when data has intervals defined by breaks `seq(0, 5, 1/12)` will aggregate to wider intervals first. It is generally recommended (and sufficient; see Seppa, Dyban and Hakulinen (2015)) to use monthly intervals where applicable.
`n`	variable containing counts of subjects at-risk at the start of a time interval; e.g. `n = "at.risk"`. Required when `surv.method = "lifetable"`. Flexible input.
`d`	variable(s) containing counts of subjects experiencing an event. With only one type of event, e.g. `d = "deaths"`. With multiple types of events (for CIF or cause-specific survival estimation), supply e.g. `d = c("canD", "othD")`. If the survival time function to be estimated does not use multiple types of events, supplying more than one variable to `d` simply causes the variables to be added together. Always required. Flexible input.
`n.cens`	variable containing counts of subjects censored during a survival time interval; E.g. `n.cens = "alive"`. Required when `surv.method = "lifetable"`. Flexible input.
`pyrs`	variable containing total subject-time accumulated within a survival time interval; E.g. `pyrs = "pyrs"`. Required when `surv.method = "hazard"`. Flexible input.
`d.exp`	variable denoting total "expected numbers of events" (typically computed `pyrs * pop.haz`, where `pop.haz` is the expected hazard level) accumulated within a survival time interval; E.g. `pyrs = "pyrs"`. Required when computing EdererII relative survivals or CIFs based on excess counts of events. Flexible input.
`n.pp`	variable containing total Pohar-Perme weighted counts of subjects at risk in an interval, supplied as argument `n` is supplied. Computed originally on the subject level as analogous to `pp * as.integer(status == "at-risk")`. Required when `relsurv.method = "pp"`. Flexible input.
`d.pp`	variable(s) containing Pohar-Perme weighted counts of events, supplied as argument `d` is supplied. Computed originally on the subject level as analogous to `pp * as.integer(status == some_event)`. Required when `relsurv.method = "pp"`. Flexible input.
`d.pp.2`	variable(s) containing total Pohar-Perme "double-weighted" counts of events, supplied as argument `d` is supplied. Computed originally on the subject level as analogous to `pp * pp * as.integer(status == some_event)`. Required when `relsurv.method = "pp"`. Flexible input.
`n.cens.pp`	variable containing total Pohar-Perme weighted counts censorings, supplied as argument `n.cens` is supplied. Computed originally on the subject level as analogous to `pp * as.integer(status == "censored")`. Required when `relsurv.method = "pp"`. Flexible input.
`pyrs.pp`	variable containing total Pohar-Perme weighted subject-times, supplied as argument `pyrs` is supplied. Computed originally on the subject level as analogous to `pp * pyrs`. Required when `relsurv.method = "pp"`. Flexible input.
`d.exp.pp`	variable containing total Pohar-Perme weighted counts of excess events, supplied as argument `pyrs` is supplied. Computed originally on the subject level as analogous to `pp * d.exp`. Required when `relsurv.method = "pp"`. Flexible input.
`surv.type`	one of `'surv.obs'`, `'surv.cause'`, `'surv.rel'`, `'cif.obs'` or `'cif.rel'`; defines what kind of survival time function(s) is/are estimated; see Details
`surv.method`	either `'lifetable'` or `'hazard'`; determines the method of calculating survival time functions, where the former computes ratios such as `p = d/(n - n.cens)` and the latter utilizes subject-times (typically person-years) for hazard estimates such as `d/pyrs` which are used to compute survival time function estimates. The former method requires argument `n.cens` and the latter argument `pyrs` to be supplied.
`relsurv.method`	either `'e2'` or `'pp'`; defines whether to compute relative survival using the EdererII method or using Pohar-Perme weighting; ignored if `surv.type != "surv.rel"`
`subset`	a logical condition; e.g. `subset = sex == 1`; subsets the data before computations
`conf.level`	confidence level used in confidence intervals; e.g. `0.95` for 95 percent confidence intervals
`conf.type`	character string; must be one of `"plain"`, `"log-log"` and `"log"`; defines the transformation used on the survival time function to yield confidence intervals via the delta method
`verbose`	logical; if `TRUE`, the function is chatty and returns some messages and timings along the process

Value

Returns a table of life time function values and other information with survival intervals as rows. Returns some of the following estimates of survival time functions:

surv.obs - observed (raw, overall) survival
surv.obs.K - observed cause-specific survival for cause K
CIF_k - cumulative incidence function for cause k
CIF.rel - cumulative incidence function using excess cases
r.e2 - relative survival, EdererII
r.pp - relative survival, Pohar-Perme weighted

The suffix .as implies adjusted estimates, and .lo and .hi imply lower and upper confidence limits, respectively. The prefix SE. stands for standard error.

Basics

This function computes interval-based estimates of survival time functions, where the intervals are set by the user. For product-limit-based estimation see packages survival and relsurv.

if surv.type = 'surv.obs', only 'raw' observed survival is estimated over the chosen time intervals. With surv.type = 'surv.rel', also relative survival estimates are supplied in addition to observed survival figures.

surv.type = 'cif.obs' requests cumulative incidence functions (CIF) to be estimated. CIFs are estimated for each competing risk based on a survival-interval-specific proportional hazards assumption as described by Chiang (1968). With surv.type = 'cif.rel', a CIF is estimated with using excess cases as the ”cause-specific” cases. Finally, with surv.type = 'surv.cause', cause-specific survivals are estimated separately for each separate type of event.

In hazard-based estimation (surv.method = "hazard") survival time functions are transformations of the estimated corresponding hazard in the intervals. The hazard itself is estimated using counts of events (or excess events) and total subject-time in the interval. Life table surv.method = "lifetable" estimates are constructed as transformations of probabilities computed using counts of events and counts of subjects at risk.

The vignette survtab_examples has some practical examples.

Relative survival

When surv.type = 'surv.rel', the user can choose relsurv.method = 'pp', whereupon Pohar-Perme weighting is used. By default relsurv.method = 'e2', i.e. the Ederer II method is used to estimate relative survival.

Adjusted estimates

Adjusted estimates in this context mean computing estimates separately by the levels of adjusting variables and returning weighted averages of the estimates. For example, computing estimates separately by age groups and returning a weighted average estimate (age-adjusted estimate).

Adjusting requires specification of both the adjusting variables and the weights for all the levels of the adjusting variables. The former can be accomplished by using adjust() with the argument formula, or by supplying variables directly to argument adjust. E.g. the following are all equivalent:

formula = fot ~ sex + adjust(agegr) + adjust(area)

formula = fot ~ sex + adjust(agegr, area)

formula = fot ~ sex, adjust = c("agegr", "area")

formula = fot ~ sex, adjust = list(agegr, area)

The adjusting variables must match with the variable names in the argument weights; see the dedicated help page. Typically weights are supplied as a list or a data.frame. The former can be done by e.g.

weights = list(agegr = VEC1, area = VEC2),

where VEC1 and VEC2 are vectors of weights (which do not have to add up to one). See survtab_examples for an example of using a data.frame to pass weights.

Period analysis and other data selection schemes

To calculate e.g. period analysis (delayed entry) estimates, limit the data when/before supplying to this function.See survtab_examples.

Data requirements

survtab_ag computes estimates of survival time functions using pre-aggregated data. For using subject-level data directly, use survtab. For aggregating data, see lexpand and aggre.

By default, and if data is an aggre object (not mandatory), survtab_ag makes use of the exact same breaks that were used in splitting the original data (with e.g. lexpand), so it is not necessary to specify any surv.breaks. If specified, the surv.breaks must be a subset of the pertinent pre-existing breaks. When data is not an aggre object, breaks must always be specified. Interval lengths (delta in output) are also calculated based on whichever breaks are used, so the upper limit of the breaks should therefore be meaningful and never e.g. Inf.

References

Perme, Maja Pohar, Janez Stare, and Jacques Esteve. "On estimation in relative survival." Biometrics 68.1 (2012): 113-120. doi:10.1111/j.1541-0420.2011.01640.x

Hakulinen, Timo, Karri Seppa, and Paul C. Lambert. "Choosing the relative survival method for cancer survival estimation." European Journal of Cancer 47.14 (2011): 2202-2210. doi:10.1016/j.ejca.2011.03.011

Seppa, Karri, Timo Hakulinen, and Arun Pokhrel. "Choosing the net survival method for cancer survival estimation." European Journal of Cancer (2013). doi:10.1016/j.ejca.2013.09.019

CHIANG, Chin Long. Introduction to stochastic processes in biostatistics. 1968. ISBN-14: 978-0471155003

Seppa K., Dyba T. and Hakulinen T.: Cancer Survival, Reference Module in Biomedical Sciences. Elsevier. 08-Jan-2015. doi:10.1016/B978-0-12-801238-3.02745-8

Examples

## see more examples with explanations in vignette("survtab_examples")

#### survtab_ag usage

data("sire", package = "popEpi")
## prepare data for e.g. 5-year "period analysis" for 2008-2012
## note: sire is a simulated cohort integrated into popEpi.
BL <- list(fot=seq(0, 5, by = 1/12),
           per = c("2008-01-01", "2013-01-01"))
x <- lexpand(sire, birth = bi_date, entry = dg_date, exit = ex_date,
             status = status %in% 1:2,
             breaks = BL,
             pophaz = popmort,
             aggre = list(fot))
             
## calculate relative EdererII period method
## NOTE: x is an aggre object here, so surv.breaks are deduced
## automatically
st <- survtab_ag(fot ~ 1, data = x)

summary(st, t = 1:5) ## annual estimates
summary(st, q = list(r.e2 = 0.75)) ## 1st interval where r.e2 < 0.75 at end

plot(st)


## non-aggre data: first call to survtab_ag would fail
df <- data.frame(x)
# st <- survtab_ag(fot ~ 1, data = x)
st <- survtab_ag(fot ~ 1, data = x, surv.breaks = BL$fot)

## calculate age-standardised 5-year relative survival ratio using 
## Ederer II method and period approach 

sire$agegr <- cut(sire$dg_age,c(0,45,55,65,75,Inf),right=FALSE)
BL <- list(fot=seq(0, 5, by = 1/12),
           per = c("2008-01-01", "2013-01-01"))
x <- lexpand(sire, birth = bi_date, entry = dg_date, exit = ex_date,
             status = status %in% 1:2,
             breaks = BL,
             pophaz = popmort,
             aggre = list(agegr, fot))

## age standardisation using internal weights (age distribution of 
## patients diagnosed within the period window)
## (NOTE: what is done here is equivalent to using weights = "internal")
w <- aggregate(at.risk ~ agegr, data = x[x$fot == 0], FUN = sum)
names(w) <- c("agegr", "weights")

st <- survtab_ag(fot ~ adjust(agegr), data = x, weights = w)
plot(st, y = "r.e2.as", col = c("blue"))

## age standardisation using ICSS1 weights
data(ICSS)
cut <- c(0, 45, 55, 65, 75, Inf)
agegr <- cut(ICSS$age, cut, right = FALSE)
w <- aggregate(ICSS1~agegr, data = ICSS, FUN = sum)
names(w) <- c("agegr", "weights")

st <- survtab_ag(fot ~ adjust(agegr), data = x, weights = w)
lines(st, y = "r.e2.as", col = c("red"))


## cause-specific survival
sire$stat <- factor(sire$status, 0:2, c("alive", "canD", "othD"))
x <- lexpand(sire, birth = bi_date, entry = dg_date, exit = ex_date,
             status = stat,
             breaks = BL,
             pophaz = popmort,
             aggre = list(agegr, fot))
st <- survtab_ag(fot ~ adjust(agegr), data = x, weights = w,
                 d = c("fromalivetocanD", "fromalivetoothD"),
                 surv.type = "surv.cause")
plot(st, y = "surv.obs.fromalivetocanD.as")
lines(st, y = "surv.obs.fromalivetoothD.as", col = "red")

[Package popEpi version 0.4.12 Index]