riskCurve {pssmooth}R Documentation

Estimation of Conditional Clinical Endpoint Risk under Placebo and Treatment Given Biomarker Response to Treatment in a Baseline Surrogate Measure Three-Phase Sampling Design

Description

Estimates P\{Y(z)=1|S(1)=s_1\}, z=0,1, on a grid of s_1 values following the estimation method of Juraska, Huang, and Gilbert (2018), where Z is the treatment group indicator (Z=1, treatment; Z=0, placebo), S(z) is a continuous or ordered categorical univariate biomarker under assignment to Z=z measured at fixed time t_0 after randomization, and Y is a binary clinical endpoint (Y=1, disease; Y=0, no disease) measured after t_0. The estimator employs the generalized product kernel density/probability estimation method of Hall, Racine, and Li (2004) implemented in the np package. The risks P\{Y(z)=1|S(z)=s_1,X=x\}, z=0,1, where X is a vector of discrete baseline covariates, are estimated by fitting inverse probability-weighted logistic regression models using the osDesign package.

Usage

riskCurve(
  formula,
  bsm,
  tx,
  data,
  pstype = c("continuous", "ordered"),
  bsmtype = c("continuous", "ordered"),
  bwtype = c("fixed", "generalized_nn", "adaptive_nn"),
  hinge = FALSE,
  weights = NULL,
  psGrid = NULL,
  saveFile = NULL,
  saveDir = NULL
)

Arguments

formula

a formula object with the binary clinical endpoint on the left of the ~ operator. The first listed variable on the right must be the biomarker response at t0 and all variables that follow, if any, are discrete baseline covariates specified in all fitted models that condition on them. Interactions and transformations of the baseline covariates are allowed. All terms in the formula must be evaluable in the data frame data.

bsm

a character string specifying the variable name in data representing the baseline surrogate measure

tx

a character string specifying the variable name in data representing the treatment group indicator

data

a data frame with one row per randomized participant endpoint-free at t_0 that contains at least the variables specified in formula, bsm and tx. Values of bsm and the biomarker at t_0 that are unavailable are represented as NA.

pstype

a character string specifying whether the biomarker response shall be treated as a continuous (default) or ordered categorical variable in the kernel density/probability estimation

bsmtype

a character string specifying whether the baseline surrogate measure shall be treated as a continuous (default) or ordered categorical variable in the kernel density/probability estimation

bwtype

a character string specifying the bandwidth type for continuous variables in the kernel density estimation. The options are fixed (default) for fixed bandwidths, generalized_nn for generalized nearest neighbors, and adaptive_nn for adaptive nearest neighbors. As noted in the documentation of the function npcdensbw in the np package: "Adaptive nearest-neighbor bandwidths change with each sample realization in the set when estimating the density at the point x. Generalized nearest-neighbor bandwidths change with the point at which the density is estimated, x. Fixed bandwidths are constant over the support of x."

hinge

a logical value (FALSE by default) indicating whether a hinge model (Fong et al., 2017) shall be used for modeling the effect of S(z) on the clinical endpoint risk. A hinge model specifies that variability in S(z) below the hinge point does not associate with the clinical endpoint risk.

weights

either a numeric vector of weights or a character string specifying the variable name in data representing weights applied to observations in the phase 2 subset in order to make inference about the target population of all randomized participants endpoint-free at t_0. The weights reflect that the case:control ratio in the phase 2 subset is different from that in the target population and are passed on to GLMs in the estimation of the hinge point. If NULL (default), weights for cases and controls are calculated separately in each study group.

psGrid

a numeric vector of S(1) values at which the conditional clinical endpoint risk in each study group is estimated. If NULL (default), a grid of values spanning the range of observed values of the biomarker will be used.

saveFile

a character string specifying the name of an .RData file storing the output list. If NULL (default), the output list will only be returned.

saveDir

a character string specifying a path for the output directory. If NULL (default), the output list will only be returned; otherwise, if saveFile is specified, the output list will also be saved as an .RData file in the specified directory.

Value

If saveFile and saveDir are both specified, the output list (named oList) is saved as an .RData file; otherwise it is returned only. The output object (of class riskCurve) is a list with the following components:

References

Fong, Y., Huang, Y., Gilbert, P. B., and Permar, S. R. (2017), chngpt: threshold regression model estimation and inference, BMC Bioinformatics, 18.

Hall, P., Racine, J., and Li, Q. (2004), Cross-validation and the estimation of conditional probability densities, JASA 99(468), 1015-1026.

Juraska, M., Huang, Y., and Gilbert, P. B. (2020), Inference on treatment effect modification by biomarker response in a three-phase sampling design, Biostatistics, 21(3): 545-560, https://doi.org/10.1093/biostatistics/kxy074.

See Also

bootRiskCurve, summary.riskCurve and plotMCEPcurve

Examples

n <- 500
Z <- rep(0:1, each=n/2)
S <- MASS::mvrnorm(n, mu=c(2,2,3), Sigma=matrix(c(1,0.9,0.7,0.9,1,0.7,0.7,0.7,1), nrow=3))
p <- pnorm(drop(cbind(1,Z,(1-Z)*S[,2],Z*S[,3]) %*% c(-1.2,0.2,-0.02,-0.2)))
Y <- sapply(p, function(risk){ rbinom(1,1,risk) })
X <- rbinom(n,1,0.5)
# delete S(1) in placebo recipients
S[Z==0,3] <- NA
# delete S(0) in treatment recipients
S[Z==1,2] <- NA
# generate the indicator of being sampled into the phase 2 subset
phase2 <- rbinom(n,1,0.4)
# delete Sb, S(0) and S(1) in controls not included in the phase 2 subset
S[Y==0 & phase2==0,] <- c(NA,NA,NA)
# delete Sb in cases not included in the phase 2 subset
S[Y==1 & phase2==0,1] <- NA
data <- data.frame(X,Z,S[,1],ifelse(Z==0,S[,2],S[,3]),Y)
colnames(data) <- c("X","Z","Sb","S","Y")
qS <- quantile(data$S, probs=c(0.05,0.95), na.rm=TRUE)
grid <- seq(qS[1], qS[2], length.out=3)

out <- riskCurve(formula=Y ~ S + factor(X), bsm="Sb", tx="Z", data=data, psGrid=grid)

# alternatively, to save the .RData output file (no '<-' needed):
riskCurve(formula=Y ~ S + factor(X), bsm="Sb", tx="Z", data=data, saveFile="out.RData",
          saveDir="./")



[Package pssmooth version 1.0.3 Index]