bootRiskCurve {pssmooth}R Documentation

Bootstrap Estimation of Conditional Clinical Endpoint Risk under Placebo and Treatment Given Biomarker Response to Treatment in a Baseline Surrogate Measure Three-Phase Sampling Design

Description

Estimates P\{Y(z)=1|S(1)=s_1\}, z=0,1, on a grid of s_1 values in bootstrap resamples (see riskCurve for notation introduction). Cases (Y=1) and controls (Y=0) are sampled separately yielding a fixed number of cases and controls in each bootstrap sample. Consequentially, the number of controls with available phase 2 data varies across bootstrap samples.

Usage

bootRiskCurve(
  formula,
  bsm,
  tx,
  data,
  pstype = c("continuous", "ordered"),
  bsmtype = c("continuous", "ordered"),
  bwtype = c("fixed", "generalized_nn", "adaptive_nn"),
  hinge = FALSE,
  weights = NULL,
  psGrid = NULL,
  iter,
  seed = NULL,
  saveFile = NULL,
  saveDir = NULL
)

Arguments

formula

a formula object with the binary clinical endpoint on the left of the ~ operator. The first listed variable on the right must be the biomarker response at t0 and all variables that follow, if any, are discrete baseline covariates specified in all fitted models that condition on them. Interactions and transformations of the baseline covariates are allowed. All terms in the formula must be evaluable in the data frame data.

bsm

a character string specifying the variable name in data representing the baseline surrogate measure

tx

a character string specifying the variable name in data representing the treatment group indicator

data

a data frame with one row per randomized participant endpoint-free at t_0 that contains at least the variables specified in formula, bsm and tx. Values of bsm and the biomarker at t_0 that are unavailable are represented as NA.

pstype

a character string specifying whether the biomarker response shall be treated as a continuous (default) or ordered categorical variable in the kernel density/probability estimation

bsmtype

a character string specifying whether the baseline surrogate measure shall be treated as a continuous (default) or ordered categorical variable in the kernel density/probability estimation

bwtype

a character string specifying the bandwidth type for continuous variables in the kernel density estimation. The options are fixed (default) for fixed bandwidths, generalized_nn for generalized nearest neighbors, and adaptive_nn for adaptive nearest neighbors. As noted in the documentation of the function npcdensbw in the np package: "Adaptive nearest-neighbor bandwidths change with each sample realization in the set when estimating the density at the point x. Generalized nearest-neighbor bandwidths change with the point at which the density is estimated, x. Fixed bandwidths are constant over the support of x."

hinge

a logical value (FALSE by default) indicating whether a hinge model (Fong et al., 2017) shall be used for modeling the effect of S(z) on the clinical endpoint risk. A hinge model specifies that variability in S(z) below the hinge point does not associate with the clinical endpoint risk. The hinge point is reestimated in each bootstrap sample.

weights

either a numeric vector of weights or a character string specifying the variable name in data representing weights applied to observations in the phase 2 subset in order to make inference about the target population of all randomized participants endpoint-free at t_0. The weights reflect that the case:control ratio in the phase 2 subset is different from that in the target population and are passed on to GLMs in the estimation of the hinge point. If NULL (default and recommended), weights for cases and controls are recalculated separately in each study group within each bootstrap sample; otherwise the same specified vector of weights is used in each bootstrap sample.

psGrid

a numeric vector of S(1) values at which the conditional clinical endpoint risk in each study group is estimated. If NULL (default), a grid of values spanning the range of observed values of the biomarker will be used.

iter

the number of bootstrap iterations

seed

a seed of the random number generator supplied to set.seed for reproducibility

saveFile

a character string specifying the name of an .RData file storing the output list. If NULL (default), the output list will only be returned.

saveDir

a character string specifying a path for the output directory. If NULL (default), the output list will only be returned; otherwise, if saveFile is specified, the output list will also be saved as an .RData file in the specified directory.

Value

If saveFile and saveDir are both specified, the output list (named bList) is saved as an .RData file; otherwise it is returned only. The output object is a list with the following components:

References

Fong, Y., Huang, Y., Gilbert, P. B., and Permar, S. R. (2017), chngpt: threshold regression model estimation and inference, BMC Bioinformatics, 18.

See Also

riskCurve, summary.riskCurve and plotMCEPcurve

Examples

n <- 500
Z <- rep(0:1, each=n/2)
S <- MASS::mvrnorm(n, mu=c(2,2,3), Sigma=matrix(c(1,0.9,0.7,0.9,1,0.7,0.7,0.7,1), nrow=3))
p <- pnorm(drop(cbind(1,Z,(1-Z)*S[,2],Z*S[,3]) %*% c(-1.2,0.2,-0.02,-0.2)))
Y <- sapply(p, function(risk){ rbinom(1,1,risk) })
X <- rbinom(n,1,0.5)
# delete S(1) in placebo recipients
S[Z==0,3] <- NA
# delete S(0) in treatment recipients
S[Z==1,2] <- NA
# generate the indicator of being sampled into the phase 2 subset
phase2 <- rbinom(n,1,0.4)
# delete Sb, S(0) and S(1) in controls not included in the phase 2 subset
S[Y==0 & phase2==0,] <- c(NA,NA,NA)
# delete Sb in cases not included in the phase 2 subset
S[Y==1 & phase2==0,1] <- NA
data <- data.frame(X,Z,S[,1],ifelse(Z==0,S[,2],S[,3]),Y)
colnames(data) <- c("X","Z","Sb","S","Y")
qS <- quantile(data$S, probs=c(0.05,0.95), na.rm=TRUE)
grid <- seq(qS[1], qS[2], length.out=3)

out <- bootRiskCurve(formula=Y ~ S + factor(X), bsm="Sb", tx="Z", data=data,
                     psGrid=grid, iter=1, seed=10)

# alternatively, to save the .RData output file (no '<-' needed):
bootRiskCurve(formula=Y ~ S + factor(X), bsm="Sb", tx="Z", data=data,
              psGrid=grid, iter=1, seed=10, saveFile="out.RData", saveDir="./")



[Package pssmooth version 1.0.3 Index]