R: Compute the Bootstrap Bandwidth for the Nonparametric...

probcurehboot {npcure}

R Documentation

Compute the Bootstrap Bandwidth for the Nonparametric Estimator of the Cure Probability

Description

This function computes the bootstrap bandwidth for the nonparametric estimator of the conditional probability of cure.

Usage

probcurehboot(x, t, d, dataset, x0, bootpars = controlpars())

Arguments

`x`	If `dataset` is missing, a numeric object giving the covariate values. If `dataset` is a data frame, it is interpreted as the name of the variable corresponding to the covariate in the data frame.
`t`	If `dataset` is missing, a numeric object giving the observed times. If `dataset` is a data frame, it is interpreted as the name of the variable corresponding to the observed times in the data frame.
`d`	If `dataset` is missing, an integer object giving the values of the uncensoring indicator. Censored observations must be coded as 0, uncensored ones as 1. If `dataset` is a data frame, it is interpreted as the name of the variable corresponding to the uncensoring indicator in the data frame.
`dataset`	An optional data frame in which the variables named in `x`, `t` and `indicator` are interpreted. If it is missing, `x`, `t` and `indicator` must be objects of the workspace.
`x0`	A numeric vector of covariate values where the local bootstrap bandwidth will be computed.
`bootpars`	A list of parameters controlling the process of bandwidth selection. The default is the value returned by the `controlpars` function called without arguments.

Details

The function computes the bootstrap bandwidth selector for the nonparametric estimator of the cure probability at the covariate values given by x0. The bootstrap bandwidth is the minimizer of a bootstrap version of the Mean Squared Error (MSE) of the cure rate estimator, which is approximated by Monte Carlo by simulating a large number, B, of bootstrap resamples. The bootstrap MSE is the bootstrap expectation of the difference between the value of the cure rate estimator computed with the bootstrap sample in a grid of bandwidths and its value computed with the original sample and a pilot bandwidth. The bootstrap resamples are generated by using the simple weighted bootstrap resampling method, fixing the covariate. This method is equivalent to the simple weighted bootstrap of Li and Datta (2001). All the parameters involved in the bootstrap bandwidth selection process (number of bootstrap resamples, grid of bandwidths, and pilot bandwidth) are typically set through the controlpars function, whose output is passed to the bootpars argument. See the help of controlpars for details.

Given the local nature of bootstrap bandwidth selection, estimates obtained from sets of bootstrap bandwidths may sometimes look wiggly. To counter this behavior, the selected vector of bootstrap bandwidths can be smoothed by computing a moving average (its order being set by controlpars). Then, the smoothed bandwidths are contained in the hsmooth component of the returned value.

Value

An object of S3 class 'npcure'. Formally, a list of components:

`type`	The constant character string c("Bootstrap bandwidth", "cure").
`x0`	Grid of covariate values.
`h`	Selected local bootstrap bandwidths.
`hsmooth`	Smoothed selected local bootstrap bandwidths (optional)
`hgrid`	Grid of bandwidths used (optional).

Author(s)

Ignacio López-de-Ullibarri [aut, cre], Ana López-Cheda [aut], Maria Amalia Jácome [aut]

References

Li, G., Datta, S. (2001). A bootstrap approach to nonparametric regression for right censored data. Annals of the Institute of Statistical Mathematics, 53: 708-729. https://doi.org/10.1023/A:1014644700806.

López-Cheda, A., Cao, R., Jácome, M. A., Van Keilegom, I. (2017). Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models. Computational Statistics & Data Analysis, 105: 144–165. https://doi.org/10.1016/j.csda.2016.08.002.

Examples

## Some artificial data
set.seed(123)
n <- 50
x <- runif(n, -2, 2) ## Covariate values
y <- rweibull(n, shape = .5*(x + 4)) ## True lifetimes
c <- rexp(n) ## Censoring values
p <- exp(2*x)/(1 + exp(2*x)) ## Probability of being susceptible
u <- runif(n)
t <- ifelse(u < p, pmin(y, c), c) ## Observed times
d <- ifelse(u < p, ifelse(y < c, 1, 0), 0) ## Uncensoring indicator
data <- data.frame(x = x, t = t, d = d)

## A vector of covariate values 
vecx0 <- seq(-1.5, 1.5, by = .1)

## Computation of bootstrap local bandwidth at the values of 'vecx0'...
#### ... with the default control parameters
set.seed(1) ## Not needed, just for reproducibility.
hb1 <- probcurehboot(x, t, d, data, x0 = vecx0)

#### ... changing the default 'bootpars' through 'controlpars()', with
#### arguments:
#### (a) 'B = 1999' (1999 bootstrap resamples are generated),
#### (b) 'hbound = c(.2, 4)' and 'hl = 50' (a grid of 50 bandwidths
#### between 0.2 and 4 times the standardized interquartilic range of
#### the covariate values is built),
#### (c) 'hsave = TRUE' (the grid bandwidths are saved), and
#### (d) 'hsmooth = 7' (the bootstrap bandwidths are smoothed by a
#### moving average of 7-th order)
set.seed(1) ## Not needed, just for reproducibility.
hb2 <- probcurehboot(x, t, d, data, x0 = vecx0, bootpars =
controlpars(B = 1999, hbound = c(.2, 4), hl = 50, hsave = TRUE, hsmooth
= 7)) 

## Estimates of the conditional probability of cure at the covariate
## values of 'vecx0' with the selected bootstrap bandwidths
q1 <- probcure(x, t, d, data, x0 = vecx0, h = hb1$h)
q2 <- probcure(x, t, d, data, x0 = vecx0, h = hb2$h)
q2sm <- probcure(x, t, d, data, x0 = vecx0, h = hb2$hsmooth)

## A plot comparing the estimates obtained with the bootstrap bandwidths
plot(q1$x0, q1$q, type = "l", xlab = "Covariate", ylab =
"Cure probability", ylim = c(0,1))
lines(q2$x0, q2$q, type = "l", lty = 2)
lines(q2sm$x0, q2sm$q, type = "l", lty = 3)
lines(q1$x0, 1 - exp(2*q1$x0)/(1 + exp(2*q1$x0)), col = 2)
legend("topright", c("Estimate with 'hb1'", "Estimate with 'hb2'",
"Estimate with 'hb2' smoothed", "True"), lty = c(1, 2, 3, 1), col = c(1,
1, 1, 2)) 


## Example with the dataset 'bmt' of the 'KMsurv' package
## to study the probability of cure as a function of the age (z1).
data("bmt", package = "KMsurv")
x0 <- seq(quantile(bmt$z1, .05), quantile(bmt$z1, .95), length.out =
100)
## This might take a while
hb <- probcurehboot(z1, t2, d3, bmt, x0 = x0, bootpars =
controlpars(B = 1999, hbound = c(.2, 2), hl = 50, hsave = TRUE, hsmooth
= 10))
q.age <- probcure(z1, t2, d3, bmt, x0 = x0, h = hb$h)
q.age.smooth <- probcure(z1, t2, d3, bmt, x0 = x0, h = hb$hsmooth)

## Plot of estimated cure probability
plot(q.age$x0, q.age$q, type = "l", ylim = c(0, 1), xlab =
"Patient age (years)", ylab = "Cure probability")
lines(q.age.smooth$x0, q.age.smooth$q, col = 2)
legend("topright", c("Estimate with h bootstrap",
"Estimate with smoothed h bootstrap"), lty = 1, col = 1:2)

[Package npcure version 0.1-5 Index]