curve_cont {contsurvplot} | R Documentation |
Estimate Counterfactual Survival or Failure Probabilities for Levels of a Continuous Variable
Description
This function can be utilized to estimate counterfactual survival curves or cumulative incidence functions (CIF) for specific values of a continuous covariate.
Usage
curve_cont(data, variable, model, horizon,
times, group=NULL, cause=1, cif=FALSE,
contrast="none", reference="km", ref_value=NULL,
event_time=NULL, event_status=NULL,
conf_int=FALSE, conf_level=0.95,
n_boot=300, n_cores=1,
na.action=options()$na.action,
return_boot=FALSE, ...)
Arguments
data |
A |
variable |
A single character string specifying the continuous variable of interest, for which the survival curves should be estimated. This variable has to be contained in the |
model |
A model describing the time-to-event process (such as an |
horizon |
A numeric vector containing a range of values of |
times |
A numeric vector containing points in time at which the survival probabilities should be calculated. |
group |
An optional single character string specifying a factor variable in |
cause |
The cause of interest. In standard survival data with only one event type, this should be kept at 1. For data with multiple failure types, this argument should be specified. In addition, the |
cif |
Whether to calculate the cumulative incidence (CIF) instead of the survival probability. If multiple failure types are present, the survival probability cannot be estimated in an unbiased way. In those cases, this argument should always be set to |
contrast |
Defines what kind of estimate should be returned. Can be either |
reference |
Defines what kind of reference value to use when estimating causal contrasts. Only used if |
ref_value |
A single number corresponding to the reference value used when estimating causal contrasts using |
event_time |
A single character string specifying the time until the occurrence of the event of interest or |
event_status |
A single character string specifying the status of the event of interest or |
conf_int |
Whether to calculate point-wise confidence intervals or not. If |
conf_level |
A number specifying the confidence level of the bootstrap confidence intervals. Ignored if |
n_boot |
A single integer specifying how many bootstrap repetitions should be performed. Ignored if |
n_cores |
The number of processor cores to use when performing the calculations. If |
na.action |
How missing values should be handled. Can be one of: |
return_boot |
Either |
... |
Further arguments passed to |
Details
This function is used internally in all plot functions included in this R-package and generally does not need to be called directly by the user. It can however be used to get specific values or as a basis to create custom plots not included in this package. Below we give a small introduction to what this function does. A more detailed description of the underlying methodology can be found in our article on this subject (Denz & Timmesfeld 2022).
Target Estimand (default)
By default (contrast="none"
) this function tries to estimate the survival probability at times
that would have been observed if every individual in the sample had received a value of horizon
in the variable
. Let Z
be the continuous variable we are interested in. Let T
be the time until the occurrence of the event of interest. Under the potential outcome framework, there is an uncountably infinite amount of potential survival times T^{(Z=z)}
, one for each possible value of Z
. The target estimand is then defined as:
S_{z}(t) = E(I(T^{(Z=z)} > t))
If we additionally consider a categorical group
variable D
, the target estimand is similarly defined as:
S_{zd}(t) = E(I(T^{(Z=z, D=d)} > t))
where T^{(Z=z, D=d)}
is the survival time that would have been observed if the individual had received both Z = z
and D = d
.
Target Estimand (using contrasts)
If contrasts are used (contrast!="none"
), target estimands based on S_{z}(t)
or S_{zd}(t)
are used instead. When using contrast="diff"
and reference="km"
, the target estimand is simply the difference between the observed survival probability in the entire sample (denoted by S(t)
, estimated using a Kaplan-Meier estimator) and the counterfactual survival probability:
\Delta_{KM}(t, z) = S(t) - S_z(t)
If contrast="ratio"
is used instead, the substraction sign is simply replace by a division. If group
was specified, S(t)
is replaced by S_d(t)
(a stratified Kaplan-Meier estimator) and S_z(t)
is replaced by S_{zd}(t)
. Instead of using a Kaplan-Meier estimator as reference
, one may also use a specific S_z(t)
as reference using reference="value"
and setting ref_value
to the Z
that should be used. All of these causal contrasts and their implications are described in detail in the appendix of our article on this topic (Denz & Timmesfeld 2022).
Estimation Methodology
G-Computation, also known as the Corrected Group Prognosis method, Direct-Standardization or G-Formula, is used internally to estimate the counterfactual survival probability or CIF for values of a continuous variable. This is done by setting the variable
to a specific value for all rows in data
first. Afterwards, the model
is used to predict the survival probability of each individual at all times
, given the value of variable
and their other observed covariates (which are included in the model as independent variables). These estimates are then averaged for each time point. This procedure is repeated for every value in horizon
. If a group
is supplied, these calculations are repeated for every possible value in group
, with the estimated individual survival probabilities also being conditional on that value.
To obtain valid estimates of the target estimand using this function, the fundamental causal identifiability assumptions have to be met. Those are described in detail in our article on this topic (Denz & Timmesfeld 2022). If those assumptions are not met, the estimates may only be used to showcase simple associations. They cannot be endowed with a counterfactual interpretation in this case.
Supported Models
This function relies on the predictRisk
function from the riskRegression package to create the covariate and time specific estimates of the probabilities. All models with an associated predictRisk
method may be used in this function. This includes a variety of models, such as the Cox proportional hazards regression model and the aalen additive hazards model. If the model that should be used has no predictRisk
method, the user either needs to write their own predictRisk
method or contact the maintainers of the riskRegression package.
Bootstrap Confidence Intervals
By using conf_int=TRUE
, bootstrap confidence intervals may be estimated. This will draw n_boot
samples of size nrow(data)
with replacement from data
in the first step. Afterwards, the model
is fit to each of those bootstrap samples and the curve_cont
function is recursively called to obtain the estimates of interests for each sample. Using the percentile approach, the bootstrap confidence intervals are calculated. This requires that the model
object contains a call
parameter, because the update()
function is used internally.
Computational Complexity
If many values are included in times
, horizon
or both and/or data
has a lot of rows, this function may become very slow. Since bootstrapping relies on sampling with replacement from the original data
and repeating the entire procedure n_boot
times, this also dramatically increases the runtime. Parallel processing may be used to speed up the computations. This can be done by simply setting n_cores
to values higher than 1.
Missing Data
Currently, this function does not support advanced handling of missing data. The data.frame
supplied to data
should contain no missing data in the relevant columns. To achieve this, the na.action
argument can be set to "na.omit"
, which will remove all rows that do contain missing data.
Competing Events
By supplying a model
that directly takes into account competing events and using the cause
argument, the user may also use the functionality offered in this package to create plots in this setting. Internally, the predictRisk
method will then be used to estimate the conditional cause-specific cumulative incidence function, which will then be used to carry out the g-computation step explained above. However the underlying target estimand of this procedure is dependent on which kind of model was supplied. It is therefore not possible to define it here concisely. Future research is necessary to clarify this point. This feature should only be used with great caution.
Value
Returns a data.frame
containing the columns time
(the point in time), est
(the estimated survival probability or CIF) and cont
(the specific value of variable
used). If group
was used, it includes the additional group
column, specifying the level of the grouping variable. If conf_int=TRUE
was used, it additionally includes the columns se
(bootstrap standard error of the estimate), ci_lower
(lower limit of the bootstrap confidence interval) and ci_upper
(upper limit of the bootstrap confidence interval).
Author(s)
Robin Denz
References
Robin Denz, Nina Timmesfeld (2023). "Visualizing the (Causal) Effect of a Continuous Variable on a Time-To-Event Outcome". In: Epidemiology 34.5
Brice Ozenne, Anne Lyngholm Sorensen, Thomas Scheike, Christian Torp-Pedersen and Thomas Alexander Gerds. riskRegression: Predicting the Risk of an Event using Cox Regression Models. The R Journal (2017) 9:2, pages 440-460.
I-Ming Chang, Rebecca Gelman, and Marcello Pagano. Corrected Group Prognostic Curves and Summary Statistics. Journal of Chronic Diseases (1982) 35, pages 669-674
James Robins. A New Approach to Causal Inference in Mortality Studies with a Sustained Exposure Period: Application to Control of the Healthy Worker Survivor Effect. Mathematical Modelling (1986) 7, pages 1393-1512.
See Also
Examples
library(contsurvplot)
library(riskRegression)
library(survival)
# using data from the survival package
data(nafld, package="survival")
# take a random sample to keep example fast
set.seed(42)
nafld1 <- nafld1[sample(nrow(nafld1), 150), ]
# fit cox-model with age
model <- coxph(Surv(futime, status) ~ age, data=nafld1, x=TRUE)
# estimate survival probability at some points in time, for
# a range of age values
plotdata <- curve_cont(data=nafld1,
variable="age",
model=model,
horizon=c(50, 60, 70, 80),
times=c(1000, 2000, 3000, 4000))
# estimate cumulative incidences instead
plotdata <- curve_cont(data=nafld1,
variable="age",
model=model,
horizon=c(50, 60, 70, 80),
times=c(1000, 2000, 3000, 4000),
cif=TRUE)