R: Estimation of Conditional Clinical Endpoint Risk under...

riskCurve {pssmooth}

R Documentation

Estimation of Conditional Clinical Endpoint Risk under Placebo and Treatment Given Biomarker Response to Treatment in a Baseline Surrogate Measure Three-Phase Sampling Design

Description

Estimates P\{Y(z)=1|S(1)=s_1\}, z=0,1, on a grid of s_1 values following the estimation method of Juraska, Huang, and Gilbert (2018), where Z is the treatment group indicator (Z=1, treatment; Z=0, placebo), S(z) is a continuous or ordered categorical univariate biomarker under assignment to Z=z measured at fixed time t_0 after randomization, and Y is a binary clinical endpoint (Y=1, disease; Y=0, no disease) measured after t_0. The estimator employs the generalized product kernel density/probability estimation method of Hall, Racine, and Li (2004) implemented in the np package. The risks P\{Y(z)=1|S(z)=s_1,X=x\}, z=0,1, where X is a vector of discrete baseline covariates, are estimated by fitting inverse probability-weighted logistic regression models using the osDesign package.

Usage

riskCurve(
  formula,
  bsm,
  tx,
  data,
  pstype = c("continuous", "ordered"),
  bsmtype = c("continuous", "ordered"),
  bwtype = c("fixed", "generalized_nn", "adaptive_nn"),
  hinge = FALSE,
  weights = NULL,
  psGrid = NULL,
  saveFile = NULL,
  saveDir = NULL
)

Arguments

`formula`	a formula object with the binary clinical endpoint on the left of the `~` operator. The first listed variable on the right must be the biomarker response at `t0` and all variables that follow, if any, are discrete baseline covariates specified in all fitted models that condition on them. Interactions and transformations of the baseline covariates are allowed. All terms in the formula must be evaluable in the data frame `data`.
`bsm`	a character string specifying the variable name in `data` representing the baseline surrogate measure
`tx`	a character string specifying the variable name in `data` representing the treatment group indicator
`data`	a data frame with one row per randomized participant endpoint-free at `t_0` that contains at least the variables specified in `formula`, `bsm` and `tx`. Values of `bsm` and the biomarker at `t_0` that are unavailable are represented as `NA`.
`pstype`	a character string specifying whether the biomarker response shall be treated as a `continuous` (default) or `ordered` categorical variable in the kernel density/probability estimation
`bsmtype`	a character string specifying whether the baseline surrogate measure shall be treated as a `continuous` (default) or `ordered` categorical variable in the kernel density/probability estimation
`bwtype`	a character string specifying the bandwidth type for continuous variables in the kernel density estimation. The options are `fixed` (default) for fixed bandwidths, `generalized_nn` for generalized nearest neighbors, and `adaptive_nn` for adaptive nearest neighbors. As noted in the documentation of the function `npcdensbw` in the `np` package: "Adaptive nearest-neighbor bandwidths change with each sample realization in the set when estimating the density at the point `x`. Generalized nearest-neighbor bandwidths change with the point at which the density is estimated, `x`. Fixed bandwidths are constant over the support of `x`."
`hinge`	a logical value (`FALSE` by default) indicating whether a hinge model (Fong et al., 2017) shall be used for modeling the effect of `S(z)` on the clinical endpoint risk. A hinge model specifies that variability in `S(z)` below the hinge point does not associate with the clinical endpoint risk.
`weights`	either a numeric vector of weights or a character string specifying the variable name in `data` representing weights applied to observations in the phase 2 subset in order to make inference about the target population of all randomized participants endpoint-free at `t_0`. The weights reflect that the case:control ratio in the phase 2 subset is different from that in the target population and are passed on to GLMs in the estimation of the hinge point. If `NULL` (default), weights for cases and controls are calculated separately in each study group.
`psGrid`	a numeric vector of `S(1)` values at which the conditional clinical endpoint risk in each study group is estimated. If `NULL` (default), a grid of values spanning the range of observed values of the biomarker will be used.
`saveFile`	a character string specifying the name of an `.RData` file storing the output list. If `NULL` (default), the output list will only be returned.
`saveDir`	a character string specifying a path for the output directory. If `NULL` (default), the output list will only be returned; otherwise, if `saveFile` is specified, the output list will also be saved as an `.RData` file in the specified directory.

Value

If saveFile and saveDir are both specified, the output list (named oList) is saved as an .RData file; otherwise it is returned only. The output object (of class riskCurve) is a list with the following components:

psGrid: a numeric vector of S(1) values at which the conditional clinical endpoint risk is estimated in the components plaRiskCurve and txRiskCurve
plaRiskCurve: a numeric vector of estimates of P\{Y(0)=1|S(1)=s_1\} for s_1 in psGrid
txRiskCurve: a numeric vector of estimates of P\{Y(1)=1|S(1)=s_1\} for s_1 in psGrid
fOptBandwidths: a conbandwidth object returned by the call of the function npcdensbw containing the optimal bandwidths, selected by likelihood cross-validation, in the kernel estimation of the conditional density of S(1) given the baseline surrogate measure and any other specified baseline covariates
gOptBandwidths: a conbandwidth object returned by the call of the function npcdensbw or npudensbw containing the optimal bandwidths, selected by likelihood cross-validation, in the kernel estimation of the conditional density of S(0) given any specified baseline covariates or the marginal density of S(0) if no baseline covariates are specified in formula
cpointP: if hinge=TRUE, the estimate of the hinge point in the placebo group
cpointT: if hinge=TRUE, the estimate of the hinge point in the treatment group

References

Fong, Y., Huang, Y., Gilbert, P. B., and Permar, S. R. (2017), chngpt: threshold regression model estimation and inference, BMC Bioinformatics, 18.

Hall, P., Racine, J., and Li, Q. (2004), Cross-validation and the estimation of conditional probability densities, JASA 99(468), 1015-1026.

Juraska, M., Huang, Y., and Gilbert, P. B. (2020), Inference on treatment effect modification by biomarker response in a three-phase sampling design, Biostatistics, 21(3): 545-560, https://doi.org/10.1093/biostatistics/kxy074.

Examples

n <- 500
Z <- rep(0:1, each=n/2)
S <- MASS::mvrnorm(n, mu=c(2,2,3), Sigma=matrix(c(1,0.9,0.7,0.9,1,0.7,0.7,0.7,1), nrow=3))
p <- pnorm(drop(cbind(1,Z,(1-Z)*S[,2],Z*S[,3]) %*% c(-1.2,0.2,-0.02,-0.2)))
Y <- sapply(p, function(risk){ rbinom(1,1,risk) })
X <- rbinom(n,1,0.5)
# delete S(1) in placebo recipients
S[Z==0,3] <- NA
# delete S(0) in treatment recipients
S[Z==1,2] <- NA
# generate the indicator of being sampled into the phase 2 subset
phase2 <- rbinom(n,1,0.4)
# delete Sb, S(0) and S(1) in controls not included in the phase 2 subset
S[Y==0 & phase2==0,] <- c(NA,NA,NA)
# delete Sb in cases not included in the phase 2 subset
S[Y==1 & phase2==0,1] <- NA
data <- data.frame(X,Z,S[,1],ifelse(Z==0,S[,2],S[,3]),Y)
colnames(data) <- c("X","Z","Sb","S","Y")
qS <- quantile(data$S, probs=c(0.05,0.95), na.rm=TRUE)
grid <- seq(qS[1], qS[2], length.out=3)

out <- riskCurve(formula=Y ~ S + factor(X), bsm="Sb", tx="Z", data=data, psGrid=grid)

# alternatively, to save the .RData output file (no '<-' needed):
riskCurve(formula=Y ~ S + factor(X), bsm="Sb", tx="Z", data=data, saveFile="out.RData",
          saveDir="./")

[Package pssmooth version 1.0.3 Index]