probcurehboot {npcure}R Documentation

Compute the Bootstrap Bandwidth for the Nonparametric Estimator of the Cure Probability

Description

This function computes the bootstrap bandwidth for the nonparametric estimator of the conditional probability of cure.

Usage

probcurehboot(x, t, d, dataset, x0, bootpars = controlpars())

Arguments

x

If dataset is missing, a numeric object giving the covariate values. If dataset is a data frame, it is interpreted as the name of the variable corresponding to the covariate in the data frame.

t

If dataset is missing, a numeric object giving the observed times. If dataset is a data frame, it is interpreted as the name of the variable corresponding to the observed times in the data frame.

d

If dataset is missing, an integer object giving the values of the uncensoring indicator. Censored observations must be coded as 0, uncensored ones as 1. If dataset is a data frame, it is interpreted as the name of the variable corresponding to the uncensoring indicator in the data frame.

dataset

An optional data frame in which the variables named in x, t and indicator are interpreted. If it is missing, x, t and indicator must be objects of the workspace.

x0

A numeric vector of covariate values where the local bootstrap bandwidth will be computed.

bootpars

A list of parameters controlling the process of bandwidth selection. The default is the value returned by the controlpars function called without arguments.

Details

The function computes the bootstrap bandwidth selector for the nonparametric estimator of the cure probability at the covariate values given by x0. The bootstrap bandwidth is the minimizer of a bootstrap version of the Mean Squared Error (MSE) of the cure rate estimator, which is approximated by Monte Carlo by simulating a large number, B, of bootstrap resamples. The bootstrap MSE is the bootstrap expectation of the difference between the value of the cure rate estimator computed with the bootstrap sample in a grid of bandwidths and its value computed with the original sample and a pilot bandwidth. The bootstrap resamples are generated by using the simple weighted bootstrap resampling method, fixing the covariate. This method is equivalent to the simple weighted bootstrap of Li and Datta (2001). All the parameters involved in the bootstrap bandwidth selection process (number of bootstrap resamples, grid of bandwidths, and pilot bandwidth) are typically set through the controlpars function, whose output is passed to the bootpars argument. See the help of controlpars for details.

Given the local nature of bootstrap bandwidth selection, estimates obtained from sets of bootstrap bandwidths may sometimes look wiggly. To counter this behavior, the selected vector of bootstrap bandwidths can be smoothed by computing a moving average (its order being set by controlpars). Then, the smoothed bandwidths are contained in the hsmooth component of the returned value.

Value

An object of S3 class 'npcure'. Formally, a list of components:

type

The constant character string c("Bootstrap bandwidth", "cure").

x0

Grid of covariate values.

h

Selected local bootstrap bandwidths.

hsmooth

Smoothed selected local bootstrap bandwidths (optional)

hgrid

Grid of bandwidths used (optional).

Author(s)

Ignacio López-de-Ullibarri [aut, cre], Ana López-Cheda [aut], Maria Amalia Jácome [aut]

References

Li, G., Datta, S. (2001). A bootstrap approach to nonparametric regression for right censored data. Annals of the Institute of Statistical Mathematics, 53: 708-729. https://doi.org/10.1023/A:1014644700806.

López-Cheda, A., Cao, R., Jácome, M. A., Van Keilegom, I. (2017). Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models. Computational Statistics & Data Analysis, 105: 144–165. https://doi.org/10.1016/j.csda.2016.08.002.

See Also

controlpars, probcure

Examples

## Some artificial data
set.seed(123)
n <- 50
x <- runif(n, -2, 2) ## Covariate values
y <- rweibull(n, shape = .5*(x + 4)) ## True lifetimes
c <- rexp(n) ## Censoring values
p <- exp(2*x)/(1 + exp(2*x)) ## Probability of being susceptible
u <- runif(n)
t <- ifelse(u < p, pmin(y, c), c) ## Observed times
d <- ifelse(u < p, ifelse(y < c, 1, 0), 0) ## Uncensoring indicator
data <- data.frame(x = x, t = t, d = d)

## A vector of covariate values 
vecx0 <- seq(-1.5, 1.5, by = .1)

## Computation of bootstrap local bandwidth at the values of 'vecx0'...
#### ... with the default control parameters
set.seed(1) ## Not needed, just for reproducibility.
hb1 <- probcurehboot(x, t, d, data, x0 = vecx0)

#### ... changing the default 'bootpars' through 'controlpars()', with
#### arguments:
#### (a) 'B = 1999' (1999 bootstrap resamples are generated),
#### (b) 'hbound = c(.2, 4)' and 'hl = 50' (a grid of 50 bandwidths
#### between 0.2 and 4 times the standardized interquartilic range of
#### the covariate values is built),
#### (c) 'hsave = TRUE' (the grid bandwidths are saved), and
#### (d) 'hsmooth = 7' (the bootstrap bandwidths are smoothed by a
#### moving average of 7-th order)
set.seed(1) ## Not needed, just for reproducibility.
hb2 <- probcurehboot(x, t, d, data, x0 = vecx0, bootpars =
controlpars(B = 1999, hbound = c(.2, 4), hl = 50, hsave = TRUE, hsmooth
= 7)) 

## Estimates of the conditional probability of cure at the covariate
## values of 'vecx0' with the selected bootstrap bandwidths
q1 <- probcure(x, t, d, data, x0 = vecx0, h = hb1$h)
q2 <- probcure(x, t, d, data, x0 = vecx0, h = hb2$h)
q2sm <- probcure(x, t, d, data, x0 = vecx0, h = hb2$hsmooth)

## A plot comparing the estimates obtained with the bootstrap bandwidths
plot(q1$x0, q1$q, type = "l", xlab = "Covariate", ylab =
"Cure probability", ylim = c(0,1))
lines(q2$x0, q2$q, type = "l", lty = 2)
lines(q2sm$x0, q2sm$q, type = "l", lty = 3)
lines(q1$x0, 1 - exp(2*q1$x0)/(1 + exp(2*q1$x0)), col = 2)
legend("topright", c("Estimate with 'hb1'", "Estimate with 'hb2'",
"Estimate with 'hb2' smoothed", "True"), lty = c(1, 2, 3, 1), col = c(1,
1, 1, 2)) 


## Example with the dataset 'bmt' of the 'KMsurv' package
## to study the probability of cure as a function of the age (z1).
data("bmt", package = "KMsurv")
x0 <- seq(quantile(bmt$z1, .05), quantile(bmt$z1, .95), length.out =
100)
## This might take a while
hb <- probcurehboot(z1, t2, d3, bmt, x0 = x0, bootpars =
controlpars(B = 1999, hbound = c(.2, 2), hl = 50, hsave = TRUE, hsmooth
= 10))
q.age <- probcure(z1, t2, d3, bmt, x0 = x0, h = hb$h)
q.age.smooth <- probcure(z1, t2, d3, bmt, x0 = x0, h = hb$hsmooth)

## Plot of estimated cure probability
plot(q.age$x0, q.age$q, type = "l", ylim = c(0, 1), xlab =
"Patient age (years)", ylab = "Cure probability")
lines(q.age.smooth$x0, q.age.smooth$q, col = 2)
legend("topright", c("Estimate with h bootstrap",
"Estimate with smoothed h bootstrap"), lty = 1, col = 1:2)


[Package npcure version 0.1-5 Index]