R: Nonparametric estimator of survival from right-truncated,...

survrtrunc {flexsurv}

R Documentation

Nonparametric estimator of survival from right-truncated, uncensored data

Description

Estimates the survivor function from right-truncated, uncensored data by reversing time, interpreting the data as left-truncated, applying the Kaplan-Meier / Lynden-Bell estimator and transforming back.

Usage

survrtrunc(t, rtrunc, tmax, data = NULL, eps = 0.001, conf.int = 0.95)

Arguments

`t`	Vector of observed times from an initial event to a final event.
`rtrunc`	Individual-specific right truncation points, so that each individual's survival time `t` would not have been observed if it was greater than the corresponding element of `rtrunc`. If any of these are greater than `tmax`, then the actual individual-level truncation point for these individuals is taken to be `tmax`.
`tmax`	Maximum possible time to event that could have been observed.
`data`	Data frame to find `t` and `rtrunc` in. If not supplied, these should be in the working environment.
`eps`	Small number that is added to `t` before implementing the time-reversed estimator, to ensure the risk set is consistent between forward and reverse time scales. It should be just large enough that `t+eps` is not `==t`. This should not need changing from the default of 0.001, unless `t` are extremely large or small and the data are rounded to integer.
`conf.int`	Confidence level, defaulting to 0.95.

Details

Note that this does not estimate the untruncated survivor function - instead it estimates the survivor function truncated above at a time defined by the maximum possible time that might have been observed in the data.

Define X as the time of the initial event, Y as the time of the final event, then we wish to determine the distribution of T = Y- X.

Observations are only recorded if Y \leq t_{max}. Then the distribution of T in the resulting sample is right-truncated by rtrunc = t_{max} - X.

Equivalently, the distribution of t_{max} - T is left-truncated, since it is only observed if t_{max} - T \geq X. Then the standard Kaplan-Meier type estimator as implemented in survfit is used (as described by Lynden-Bell, 1971) and the results transformed back.

This situation might happen in a disease epidemic, where X is the date of disease onset for an individual, Y is the date of death, and we wish to estimate the distribution of the time T from onset to death, given we have only observed people who have died by the date t_{max}.

If the estimated survival is unstable at the highest times, then consider replacing tmax by a slightly lower value, then if necessary, removing individuals with t > tmax, so that the estimand is changed to the survivor function truncated over a slightly narrower interval.

Value

A list with components:

time Time points where the estimated survival changes.

surv Estimated survival at time, truncated above at tmax.

se.surv Standard error of survival.

std.err Standard error of -log(survival). Named this way for consistency with survfit.

lower Lower confidence limits for survival.

upper Upper confidence limits for survival.

References

D. Lynden-Bell (1971) A method of allowing for known observational selection in small samples applied to 3CR quasars. Monthly Notices of the Royal Astronomical Society, 155:95–118.

Seaman, S., Presanis, A. and Jackson, C. (2020) Review of methods for estimating distribution of time to event from right-truncated data.

Examples


## simulate some event time data
set.seed(1)
X <- rweibull(100, 2, 10)
T <- rweibull(100, 2, 10)

## truncate above
tmax <- 20
obs <- X + T < tmax
rtrunc <- tmax - X
dat <- data.frame(X, T, rtrunc)[obs,]
sf <-    survrtrunc(T, rtrunc, data=dat, tmax=tmax)
plot(sf, conf.int=TRUE)
## Kaplan-Meier estimate ignoring truncation is biased
sfnaive <- survfit(Surv(T) ~ 1, data=dat)
lines(sfnaive, conf.int=TRUE, lty=2, col="red")

## truncate above the maximum observed time
tmax <- max(X + T) + 10
obs <- X + T < tmax
rtrunc <- tmax - X
dat <- data.frame(X, T, rtrunc)[obs,]
sf <-    survrtrunc(T, rtrunc, data=dat, tmax=tmax)
plot(sf, conf.int=TRUE)
## estimates identical to the standard Kaplan-Meier
sfnaive <- survfit(Surv(T) ~ 1, data=dat)
lines(sfnaive, conf.int=TRUE, lty=2, col="red")

[Package flexsurv version 2.3 Index]