R: Estimate Survival Time Functions

survtab {popEpi}

R Documentation

Estimate Survival Time Functions

Description

This function estimates survival time functions: survival, relative/net survival, and crude/absolute risk functions (CIF).

Usage

survtab(
  formula,
  data,
  adjust = NULL,
  breaks = NULL,
  pophaz = NULL,
  weights = NULL,
  surv.type = "surv.rel",
  surv.method = "hazard",
  relsurv.method = "e2",
  subset = NULL,
  conf.level = 0.95,
  conf.type = "log-log",
  verbose = FALSE
)

Arguments

`formula`	a `formula`; e.g. `fot ~ sex`, where `fot` is the time scale over which you wish to estimate a survival time function; this assumes that `lex.Xst` in your data is the status variable in the intended format (almost always right). To be explicit, use `Surv`: e.g. `Surv(fot, lex.Xst) ~ sex`. Variables on the right-hand side of the formula separated by `+` are considered stratifying variables, for which estimates are computed separately. May contain usage of `adjust()` — see Details and Examples.
`data`	a `Lexis` object with at least the survival time scale
`adjust`	can be used as an alternative to passing variables to argument `formula` within a call to `adjust()`; e.g. `adjust = "agegr"`. Flexible input.
`breaks`	a named list of breaks, e.g. `list(FUT = 0:5)`. If data is not split in advance, `breaks` must at the very least contain a vector of breaks to split the survival time scale (mentioned in argument `formula`). If data has already been split (using e.g. `splitMulti`) along at least the used survival time scale, this may be `NULL`. It is generally recommended (and sufficient; see Seppa, Dyban and Hakulinen (2015)) to use monthly intervals where applicable.
`pophaz`	a `data.frame` containing expected hazards for the event of interest to occur. See the dedicated help page. Required when `surv.type = "surv.rel"` or `"cif.rel"`. `pophaz` must contain one column named `"haz"`, and any number of other columns identifying levels of variables to do a merge with split data within `survtab`. Some columns may be time scales, which will allow for the expected hazard to vary by e.g. calendar time and age.
`weights`	typically a list of weights or a `character` string specifying an age group standardization scheme; see the dedicated help page and examples. NOTE: `weights = "internal"` is based on the counts of persons in follow-up at the start of follow-up (typically T = 0)
`surv.type`	one of `'surv.obs'`, `'surv.cause'`, `'surv.rel'`, `'cif.obs'` or `'cif.rel'`; defines what kind of survival time function(s) is/are estimated; see Details
`surv.method`	either `'lifetable'` or `'hazard'`; determines the method of calculating survival time functions, where the former computes ratios such as `p = d/(n - n.cens)` and the latter utilizes subject-times (typically person-years) for hazard estimates such as `d/pyrs` which are used to compute survival time function estimates. The former method requires argument `n.cens` and the latter argument `pyrs` to be supplied.
`relsurv.method`	either `'e2'` or `'pp'`; defines whether to compute relative survival using the EdererII method or using Pohar-Perme weighting; ignored if `surv.type != "surv.rel"`
`subset`	a logical condition; e.g. `subset = sex == 1`; subsets the data before computations
`conf.level`	confidence level used in confidence intervals; e.g. `0.95` for 95 percent confidence intervals
`conf.type`	character string; must be one of `"plain"`, `"log-log"` and `"log"`; defines the transformation used on the survival time function to yield confidence intervals via the delta method
`verbose`	logical; if `TRUE`, the function is chatty and returns some messages and timings along the process

Value

Returns a table of life time function values and other information with survival intervals as rows. Returns some of the following estimates of survival time functions:

surv.obs - observed (raw, overall) survival
surv.obs.K - observed cause-specific survival for cause K
CIF_k - cumulative incidence function for cause k
CIF.rel - cumulative incidence function using excess cases
r.e2 - relative survival, EdererII
r.pp - relative survival, Pohar-Perme weighted

The suffix .as implies adjusted estimates, and .lo and .hi imply lower and upper confidence limits, respectively. The prefix SE. stands for standard error.

Basics

This function computes interval-based estimates of survival time functions, where the intervals are set by the user. For product-limit-based estimation see packages survival and relsurv.

if surv.type = 'surv.obs', only 'raw' observed survival is estimated over the chosen time intervals. With surv.type = 'surv.rel', also relative survival estimates are supplied in addition to observed survival figures.

surv.type = 'cif.obs' requests cumulative incidence functions (CIF) to be estimated. CIFs are estimated for each competing risk based on a survival-interval-specific proportional hazards assumption as described by Chiang (1968). With surv.type = 'cif.rel', a CIF is estimated with using excess cases as the ”cause-specific” cases. Finally, with surv.type = 'surv.cause', cause-specific survivals are estimated separately for each separate type of event.

In hazard-based estimation (surv.method = "hazard") survival time functions are transformations of the estimated corresponding hazard in the intervals. The hazard itself is estimated using counts of events (or excess events) and total subject-time in the interval. Life table surv.method = "lifetable" estimates are constructed as transformations of probabilities computed using counts of events and counts of subjects at risk.

The vignette survtab_examples has some practical examples.

Relative survival

When surv.type = 'surv.rel', the user can choose relsurv.method = 'pp', whereupon Pohar-Perme weighting is used. By default relsurv.method = 'e2', i.e. the Ederer II method is used to estimate relative survival.

Adjusted estimates

Adjusted estimates in this context mean computing estimates separately by the levels of adjusting variables and returning weighted averages of the estimates. For example, computing estimates separately by age groups and returning a weighted average estimate (age-adjusted estimate).

Adjusting requires specification of both the adjusting variables and the weights for all the levels of the adjusting variables. The former can be accomplished by using adjust() with the argument formula, or by supplying variables directly to argument adjust. E.g. the following are all equivalent:

formula = fot ~ sex + adjust(agegr) + adjust(area)

formula = fot ~ sex + adjust(agegr, area)

formula = fot ~ sex, adjust = c("agegr", "area")

formula = fot ~ sex, adjust = list(agegr, area)

The adjusting variables must match with the variable names in the argument weights; see the dedicated help page. Typically weights are supplied as a list or a data.frame. The former can be done by e.g.

weights = list(agegr = VEC1, area = VEC2),

where VEC1 and VEC2 are vectors of weights (which do not have to add up to one). See survtab_examples for an example of using a data.frame to pass weights.

Period analysis and other data selection schemes

To calculate e.g. period analysis (delayed entry) estimates, limit the data when/before supplying to this function.See survtab_examples.

References

Perme, Maja Pohar, Janez Stare, and Jacques Esteve. "On estimation in relative survival." Biometrics 68.1 (2012): 113-120. doi:10.1111/j.1541-0420.2011.01640.x

Hakulinen, Timo, Karri Seppa, and Paul C. Lambert. "Choosing the relative survival method for cancer survival estimation." European Journal of Cancer 47.14 (2011): 2202-2210. doi:10.1016/j.ejca.2011.03.011

Seppa, Karri, Timo Hakulinen, and Arun Pokhrel. "Choosing the net survival method for cancer survival estimation." European Journal of Cancer (2013). doi:10.1016/j.ejca.2013.09.019

CHIANG, Chin Long. Introduction to stochastic processes in biostatistics. 1968. ISBN-14: 978-0471155003

Seppa K., Dyba T. and Hakulinen T.: Cancer Survival, Reference Module in Biomedical Sciences. Elsevier. 08-Jan-2015. doi:10.1016/B978-0-12-801238-3.02745-8

Examples


data("sire", package = "popEpi")
library(Epi)

## NOTE: recommended to use factor status variable
x <- Lexis(entry = list(FUT = 0, AGE = dg_age, CAL = get.yrs(dg_date)), 
           exit = list(CAL = get.yrs(ex_date)), 
           data = sire[sire$dg_date < sire$ex_date, ],
           exit.status = factor(status, levels = 0:2, 
                                labels = c("alive", "canD", "othD")), 
           merge = TRUE)

## phony group variable
set.seed(1L)
x$group <- rbinom(nrow(x), 1, 0.5)

## observed survival. explicit supplying of status:
st <- survtab(Surv(time = FUT, event = lex.Xst) ~ group, data = x, 
              surv.type = "surv.obs",
              breaks = list(FUT = seq(0, 5, 1/12)))
## this assumes the status is lex.Xst (right 99.9 % of the time)
st <- survtab(FUT ~ group, data = x, 
              surv.type = "surv.obs",
              breaks = list(FUT = seq(0, 5, 1/12)))
              
## relative survival (ederer II)
data("popmort", package = "popEpi")
pm <- data.frame(popmort)
names(pm) <- c("sex", "CAL", "AGE", "haz")
st <- survtab(FUT ~ group, data = x, 
              surv.type = "surv.rel",
              pophaz = pm,
              breaks = list(FUT = seq(0, 5, 1/12)))

## ICSS weights usage
data("ICSS", package = "popEpi")
cut <- c(0, 30, 50, 70, Inf)
agegr <- cut(ICSS$age, cut, right = FALSE)
w <- aggregate(ICSS1~agegr, data = ICSS, FUN = sum)
x$agegr <- cut(x$dg_age, cut, right = FALSE)
st <- survtab(FUT ~ group + adjust(agegr), data = x, 
              surv.type = "surv.rel",
              pophaz = pm, weights = w$ICSS1,
              breaks = list(FUT = seq(0, 5, 1/12)))

#### using dates with survtab
x <- Lexis(entry = list(FUT = 0L, AGE = dg_date-bi_date, CAL = dg_date),
           exit = list(CAL = ex_date),
           data = sire[sire$dg_date < sire$ex_date, ],
           exit.status = factor(status, levels = 0:2, 
                                labels = c("alive", "canD", "othD")), 
           merge = TRUE)
## phony group variable
set.seed(1L)
x$group <- rbinom(nrow(x), 1, 0.5)

st <- survtab(Surv(time = FUT, event = lex.Xst) ~ group, data = x, 
              surv.type = "surv.obs",
              breaks = list(FUT = seq(0, 5, 1/12)*365.25))    
                  
## NOTE: population hazard should be reported at the same scale
## as time variables in your Lexis data.
data(popmort, package = "popEpi")
pm <- data.frame(popmort)
names(pm) <- c("sex", "CAL", "AGE", "haz")
## from year to day level
pm$haz <- pm$haz/365.25 
pm$CAL <- as.Date(paste0(pm$CAL, "-01-01")) 
pm$AGE <- pm$AGE*365.25 

st <- survtab(Surv(time = FUT, event = lex.Xst) ~ group, data = x, 
              surv.type = "surv.rel", relsurv.method = "e2",
              pophaz = pm,
              breaks = list(FUT = seq(0, 5, 1/12)*365.25))

[Package popEpi version 0.4.12 Index]