coxed.npsf {coxed} | R Documentation |
Predict expected durations using the GAM method
Description
This function is called by coxed
and is not intended to be used by itself.
Usage
coxed.npsf(cox.model, newdata = NULL, coef = NULL, b.ind = NULL)
Arguments
cox.model |
The output from a Cox proportional hazards model estimated
with the |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used |
coef |
A vector of new coefficients to replace the |
b.ind |
A vector of observation numbers to pass to the estimation sample to construct the a bootstrapped sample with replacement |
Details
The non-parametric step function (NPSF) approach to calculating expected durations from the Cox proportional hazards model, described in Kropko and Harden (2018), uses the method proposed by Cox and Oakes (1984, 107-109) for estimating the cumulative baseline hazard function. This method is nonparametric and results in a step-function representation of the cumulative baseline hazard.
Cox and Oakes (1984, 108) show that the cumulative baseline hazard function can be estimated after fitting a Cox model by
\hat{H}_0(t) = \sum_{\tau_j < t}\frac{d_j}{\sum_{l\in\Re(\tau_j)}\hat{\psi}(l)},
where \tau_j
represents time points earlier than t
, d_j
is a count of the
total number of failures at \tau_j
, \Re(\tau_j)
is the remaining risk set at \tau_j
,
and \hat{\psi}(l)
represents the ELP from the Cox model for observations still in the
risk set at \tau_j
. This equation is used calculate the cumulative baseline hazard at
all time points in the range of observed durations. This estimate is a stepwise function
because time points with no failures do not contribute to the cumulative hazard, so the function
is flat until the next time point with observed failures.
We extend this method to obtain expected durations by first calculating the baseline survivor function from the cumulative hazard function, using
\hat{S}_0(t) = \exp[-\hat{H}_0(t)].
Each observation's survivor function is related to the baseline survivor function by
\hat{S}_i(t) = \hat{S}_0(t)^{\hat{\psi}(i)},
where \hat{\psi}(i)
is the exponentiated linear predictor (ELP) for observation i
.
These survivor functions can be used directly to calculate expected durations for each
observation. The expected value of a non-negative random variable can be calculated by
E(X) = \int_0^{\infty} \bigg(1 - F(t)\bigg)dt,
where F(.)
is the cumulative distribution function for X
. In the case of a
duration variable t_i
, the expected duration is
E(t_i) = \int_0^T S_i(t)\,dt,
where T
is the largest possible duration and S(t)
is the individual's survivor
function. We approximate this integral with a right Riemann-sum by calculating the survivor
functions at every discrete time point from the minimum to the maximum observed durations,
and multiplying these values by the length of the interval between time points with observed failures:
E(t_i) \approx \sum_{t_j \in [0,T]} (t_j - t_{j-1})S_i(t_j).
Value
Returns a list containing the following components:
exp.dur | A vector of predicted mean durations for the estimation sample
if newdata is omitted, or else for the specified new data. |
baseline.functions | The estimated cumulative baseline hazard function and survivor function. |
Author(s)
Jonathan Kropko <jkropko@virginia.edu> and Jeffrey J. Harden <jharden2@nd.edu>
References
Kropko, J. and Harden, J. J. (2018). Beyond the Hazard Ratio: Generating Expected Durations from the Cox Proportional Hazards Model. British Journal of Political Science https://doi.org/10.1017/S000712341700045X
Cox, D. R., and Oakes, D. (1984). Analysis of Survival Data. Monographs on Statistics & Applied Probability
See Also
Examples
mv.surv <- Surv(martinvanberg$formdur, event = rep(1, nrow(martinvanberg)))
mv.cox <- coxph(mv.surv ~ postel + prevdef + cont + ident + rgovm + pgovno + tpgovno +
minority, method = "breslow", data = martinvanberg)
ed <- coxed.npsf(mv.cox)
ed$baseline.functions
ed$exp.dur
#Running coxed.npsf() on a bootstrap sample and with new coefficients
bsample <- sample(1:nrow(martinvanberg), nrow(martinvanberg), replace=TRUE)
newcoefs <- rnorm(8)
ed2 <- coxed.npsf(mv.cox, b.ind=bsample, coef=newcoefs)