berancv {npcure} | R Documentation |
Compute the Cross-Validation Bandwidth for Beran's Estimator of the Conditional Survival
Description
This function computes the cross-validation bandwidth for Beran's estimator of the conditional survival function.
Usage
berancv(x, t, d, dataset, x0, cvpars = controlpars())
Arguments
x |
If |
t |
If |
d |
If |
dataset |
An optional data frame in which the variables named in
|
x0 |
A numeric vector of covariate values where the local cross-validation bandwidth will be computed. |
cvpars |
A list of parameters controlling the process of bandwidth
selection. The default is the value returned by the |
Details
The cross-validation (CV) bandwidth is taken as the largest
local minimizer of the leave-one-out cross-validated criterion in
Geerdens et al. (2018). Let F^{(-i)}(t | x_i)
, i = 1,
\ldots, n
be the Beran estimator obtained using the
data points (x_j, t_j, d_j)
, j = 1, \ldots, i-1, i+1,
\ldots, n
. For the CV criterion,
the differences I(t_i \leq t_j)-F^{(-i)}(t_j|x_i)
are computed only for the so-called 'useful
pairs' of observed times (t_i, t_j)
. A pair (T_i, T_j)
is
useful if the value of the indicator I(T_i \leq T_j)
gives an unambiguous correct value for the indicator I(Y_i
\leq Y_j)
which contains the corresponding true
(possibly unknown) event times, see Geerdens et al. (2018) for
details. Gannoun et al. (2007) apply a similar criterion to perform
bandwidth selection for the Beran estimator, but they consider only
the pairs of true (uncensored) event times. Note that the inclusion of
useful pairs of observed times would be especially advantageous if the
censoring rate is high.
Value
An object of S3 class 'npcure'. Formally, a list of components:
type |
The constant character string c("Cross-validation bandwidth", "survival"). |
x0 |
Grid of covariate values. |
h |
Selected local cross-validation bandwidths. |
hgrid |
Grid of bandwidths used (optional). |
Author(s)
Ignacio López-de-Ullibarri [aut, cre], Ana López-Cheda [aut], Maria Amalia Jácome [aut]
References
Gannoun, A., Saracco, J., Yu, K. (2007). Comparison of kernel estimators of conditional distribution function and quantile regression under censoring. Statistical Modeling, 7: 329-344. https://doi.org/10.1177/1471082X0700700404.
Geerdens, C., Acar, E. F., Janssen, P. (2018). Conditional copula models for right-censored clustered event time data. Biostatistics, 19(2): 247-262. https://doi.org/10.1093/biostatistics/kxx034.
See Also
Examples
## Some artificial data
set.seed(123)
n <- 50
x <- runif(n, -2, 2) ## Covariate values
y <- rweibull(n, shape = .5*(x + 4)) ## True lifetimes
c <- rexp(n) ## Censoring values
p <- exp(2*x)/(1 + exp(2*x)) ## Probability of being susceptible
u <- runif(n)
t <- ifelse(u < p, pmin(y, c), c) ## Observed times
d <- ifelse(u < p, ifelse(y < c, 1, 0), 0) ## Uncensoring indicator
data <- data.frame(x = x, t = t, d = d)
## Computation of cross-validation (CV) local bandwidth for Beran's
## estimator of survival for covariate values 0, 1, ...
#### ... with the default control parameters (passed through 'cvpars')
x0 <- c(0, 1)
hcv <- berancv(x, t, d, data, x0 = x0)
#### ... changing the default 'cvpars' by calling the 'controlpars()'
#### function:
#### (a) the CV local bandwidth is searched in a grid of 150 bandwidths
#### ('hl = 150') between 0.2 and 4 times the standardized interquartile
#### range of the covariate values of x ('hbound = c(.2, 4'))
#### (b) all the grid bandwidths are saved ('hsave = TRUE')
hcv <- berancv(x, t, d, data, x0 = x0, cvpars = controlpars(hbound =
c(.2, 4), hl = 150, hsave = TRUE))
## Survival estimates for covariate values 0, 1, with CV local bandwidth
S1 <- beran(x, t, d, data, x0 = x0, h = hcv$h)
## Plot predicted survival curves for covariate values 0, 1
plot (S1$testim, S1$S$x0, type = "s", xlab = "Time", ylab = "Survival",
ylim = c(0, 1))
lines(S1$testim, S1$S$x1, type = "s", lty = 2)
## The survival curves are displayed for reference
p0 <- exp(2*x0)/(1 + exp(2*x0))
lines(S1$testim, 1 - p0[1] + p0[1]*pweibull(S1$testim, shape = .5*(x0[1]
+ 4), lower.tail = FALSE), col = 2)
lines(S1$testim, 1 - p0[2] + p0[2]*pweibull(S1$testim, shape = .5*(x0[2]
+ 4), lower.tail = FALSE), lty = 2, col = 2)
legend("topright", c("Estimate, x = 0", "True, x = 0",
"Estimate, x = 1", "True, x = 1"), lty = c(1, 1, 2, 2), col = 1:2)
## Example with the dataset 'bmt' of the 'KMsurv' package to study the
## survival of patients aged 25 and 40.
data("bmt", package = "KMsurv")
x0 <- c(25, 40)
hcv <- berancv(z1, t2, d3, bmt, x0 = x0, cvpars = controlpars(hbound =
c(.2, 4), hl = 150, hsave = TRUE))
S <- beran(z1, t2, d3, bmt, x0 = x0, h = hcv$h, conflevel = .95)
## Plot of predicted survival curves and confidence intervals
plot(S$testim, S$S$x25, type = "s", xlab = "Time", ylab = "Survival",
ylim = c(0, 1))
lines(S$testim, S$conf$x25$lower, type = "s", lty = 2)
lines(S$testim, S$conf$x25$upper, type = "s", lty = 2)
lines(S$testim, S$S$x40, type = "s", lty = 1, col = 2)
lines(S$testim, S$conf$x40$lower, type = "s", lty = 2, col = 2)
lines(S$testim, S$conf$x40$upper, type = "s", lty = 2, col = 2)
legend("topright", c("Age 25: Estimate", "Age 25: 95% CI limits",
"Age 40: Estimate", "Age 40: 95% CI limits"), lty = 1:2,
col = c(1, 1, 2, 2))