R: Nonparametric Kernel-Smoothed Stratified Mark-Specific...

kernel_sievePHaipw {sievePH}

R Documentation

Nonparametric Kernel-Smoothed Stratified Mark-Specific Proportional Hazards Model with a Univariate Continuous Mark, Missing-at-Random in Some Failures

Description

kernel_sievePH implements estimation methods of Sun and Gilbert (2012) and hypothesis testing methods of Gilbert and Sun (2015) for a mark-specific proportional hazards model accommodating that some failures have a missing mark. The methods allow separate baseline mark-specific hazard functions for different baseline subgroups. Missing marks are handled via augmented IPW (AIPW) approach.

Usage

kernel_sievePHaipw(
  eventTime,
  eventInd,
  mark,
  tx,
  aux = NULL,
  auxType = NULL,
  zcov = NULL,
  strata = NULL,
  formulaPH = ~tx,
  formulaMiss = NULL,
  formulaAux = NULL,
  tau = NULL,
  tband = NULL,
  hband = NULL,
  nvgrid = 100,
  a = NULL,
  b = NULL,
  ntgrid = NULL,
  nboot = 500,
  seed = NULL,
  maxit = 6
)

Arguments

`eventTime`	a numeric vector specifying the observed right-censored event time.
`eventInd`	a numeric vector indicating the event of interest (1 if event, 0 if right-censored).
`mark`	a numeric vector specifying a univariate continuous mark subject to missingness at random. Missing mark values should be set to `NA`. For subjects with `eventInd = 0`, the value in `mark` should also be set to `NA`.
`tx`	a numeric vector indicating the treatment group (1 if treatment, 0 if placebo).
`aux`	a numeric vector specifying a binary or a continuous auxiliary covariate which may be potentially useful for predicting missingness, i.e, the probability of missing, and for informing about the distribution of missing marks. The mark missingness model only requires that the auxiliary covariates be observed in subjects who experienced the event of interest. For subjects with `eventInd = 0`, the value in `aux` may be set to `NA`. If no auxiliary covariate is used, set `aux` to the default of `NULL`.
`auxType`	a character string describing the data type of `aux` if `aux` is used. Data types allowed include "binary" and "continuous". If `aux` is not used, `auxType` should be set to the default of `NULL`.
`zcov`	a data frame with one row per subject specifying possibly time-dependent covariate(s) (not including `tx`). If no covariate is used, `zcov` should be set to the default of `NULL`.
`strata`	a numeric vector specifying baseline strata (`NULL` by default). If specified, a separate mark-specific baseline hazard is assumed for each stratum. It also allows the models of the probability of complete-case and of the mark distribution to differ across strata.
`formulaPH`	a one-sided formula object (on the right side of the `~` operator) specifying the linear predictor in the proportional hazards model. Available variables to be used in the formula include `tx` and variable(s) in `zcov`. By default, `formulaPH` is specified as `~ tx`.
`formulaMiss`	a one-sided formula object (on the right side of the `~` operator) specifying the linear predictor in the logistic regression model used for predicting the probability of observing the mark. `formulaMiss` must be provided for the `AIPW` method. Available variables to be used in the formula include `eventTime`, `tx`, `aux`, and variable(s) in `zcov`.
`formulaAux`	a one-sided formula object (on the right side of the `~` operator) specifying the variables used for estimating the conditional distribution of `aux`. If `aux` is binary, the formula specifies the linear predictor in a logistic regression and if `aux` is continuous, the formula provides a symbolic description of variables used in kernel conditional density estimation. `formulaAux` is optional for the `AIPW` estimation procedure. Available variables to be used in the formula include `eventTime`, `tx`, `mark`, and variable(s) in `zcov`.
`tau`	a numeric value specifying the duration of study follow-up period. Failures beyond `tau` are treated right-censored. There needs to be at least `10\%` of subjects (as a rule of thumb) remaining uncensored by `tau` for the estimation to be stable. By default, `tau` is set as the maximum of `eventTime`.
`tband`	a numeric value between 0 and `tau` specifying the bandwidth of the kernel smoothing function over time. By default, `tband` is set as (`tau`-min(`eventTime`))/5.
`hband`	a numeric value between 0 and 1 specifying the bandwidth of the kernel smoothing function over mark. By default, `hband` is set as `4\sigma n^{-1/3}` where `\sigma` is the estimated standard deviation of the observed marks for uncensored failure times and `n` is the number of subjects in the dataset. Larger bandwidths are recommended for higher percentages of missing marks.
`nvgrid`	an integer value (100 by default) specifying the number of equally spaced mark values between the minimum and maximum of the observed mark for which the treatment effects are evaluated.
`a`	a numeric value between the minimum and maximum of observed mark values specifying the lower bound of the range for testing the null hypotheses `H_{10}: HR(v) = 1` and `H_{20}: HR(v)` does not depend on `v`, for `v \in [a, b]`; By default, `a` is set as `(max(mark) - min(mark))/nvgrid + min(mark)`.
`b`	a numeric value between the minimum and maximum of observed mark specifying the upper bound of the range for testing the null hypotheses `H_{10}: HR(v) = 1` and `H_{20}: HR(v)` does not depend on `v`, for `v \in [a, b]`; By default, `b` is set as `max(mark)`.
`ntgrid`	an integer value (`NULL` by default) specifying the number of equally spaced time points for which the mark-specific baseline hazard functions are evaluated. If `NULL`, baseline hazard functions are not evaluated.
`nboot`	number of bootstrap iterations (500 by default) for simulating the distributions of test statistics. If `NULL`, the hypotheses tests are not performed.
`seed`	an integer specifying the random number generation seed for reproducing the test statistics and p-values. By default, a specific seed is not set.
`maxit`	Maximum number of iterations to attempt for convergence in estimation. The default is 6.

Details

kernel_sievePH analyzes data from a randomized placebo-controlled trial that evaluates treatment efficacy for a time-to-event endpoint with a continuous mark. The parameter of interest is the ratio of the conditional mark-specific hazard functions (treatment/placebo), which is based on a stratified mark-specific proportional hazards model. This model assumes no parametric form for the baseline hazard function nor the treatment effect across different mark values. For data with missing marks, the estimation procedure leverages auxiliary predictors of whether the mark is observed and augments the IPW estimator with auxiliary predictors of the missing mark value.

Value

An object of class kernel_sievePH which can be processed by summary.kernel_sievePH to obtain or print a summary of the results. An object of class kernel_sievePH is a list containing the following components:

H10: a data frame with test statistics (first row) and corresponding p-values (second row) for testing H_{10}: HR(v) = 1 for v \in [a, b]. Columns TSUP1 and Tint1 include test statistics and p-values for testing H_{10} vs. H_{1a}: HR(v) \neq 1 for any v \in [a, b] (general alternative). Columns TSUP1m and Tint1m include test statistics and p-values for testing H_{10} vs. H_{1m}: HR(v) \leq 1 with strict inequality for some v in [a, b] (monotone alternative). TSUP1 and TSUP1m are based on extensions of the classic Kolmogorov-Smirnov supremum-based test. Tint1 and Tint1m are based on generalizations of the integration-based Cramer-von Mises test. Tint1 and Tint1m involve integration of deviations over the whole range of the mark. If nboot is NULL, H10 is returned as NULL.
H20: a data frame with test statistics (first row) and corresponding p-values (second row) for testing H_{20}: HR(v) does not depend on v \in [a, b]. Columns TSUP2 and Tint2 include test statistics and p-values for testing H_{20} vs. H_{2a}: HR depends on v \in [a, b] (general alternative). Columns TSUP2m and Tint2m include test statistics and p-values for testing H_{20} vs. H_{2m}: HR increases as v increases \in [a, b] (monotone alternative). TSUP2 and TSUP2m are based on extensions of the classic Kolmogorov-Smirnov supremum-based test. Tint2 and Tint2m are based on generalizations of the integration-based Cramer-von Mises test. Tint2 and Tint2m involve integration of deviations over the whole range of the mark. If nboot is NULL, H20 is returned as NULL.
estBeta: a data frame summarizing point estimates and standard errors of the mark-specific coefficients for treatment at equally-spaced values between the minimum and the maximum of the observed mark values.
cBproc1: a data frame containing equally-spaced mark values in the column Mark, test processes Q^{(1)}(v) for observed data in the column Observed, and Q^{(1)}(v) for nboot independent sets of normal samples in the columns S1, S2, \cdots. If nboot is NULL, cBproc1 is returned as NULL.
cBproc2: a data frame containing equally-spaced mark values in the column Mark, test processes Q^{(2)}(v) for observed data in the column Observed, and Q^{(2)}(v) for nboot independent sets of normal samples in the columns S1, S2, \cdots. If nboot is NULL, cBproc2 is returned as NULL.
Lambda0: an array of dimension K x nvgrid x ntgrid for the kernel-smoothed baseline hazard function \lambda_{0k}, k = 1, \dots, K where K is the number of strata. If ntgrid is NULL (by default), Lambda0 is returned as NULL.

References

Gilbert, P. B. and Sun, Y. (2015). Inferences on relative failure rates in stratified mark-specific proportional hazards models with missing marks, with application to human immunodeficiency virus vaccine efficacy trials. Journal of the Royal Statistical Society Series C: Applied Statistics, 64(1), 49-73.

Sun, Y. and Gilbert, P. B. (2012). Estimation of stratified mark‐specific proportional hazards models with missing marks. Scandinavian Journal of Statistics, 39(1), 34-52.

Yang, G., Sun, Y., Qi, L., & Gilbert, P. B. (2017). Estimation of stratified mark-specific proportional hazards models under two-phase sampling with application to HIV vaccine efficacy trials. Statistics in biosciences, 9, 259-283.

Examples

set.seed(20240410)
beta <- 2.1
gamma <- -1.3
n <- 200
tx <- rep(0:1, each = n / 2)
tm <- c(rexp(n / 2, 0.2), rexp(n / 2, 0.2 * exp(gamma)))
cens <- runif(n, 0, 15)
eventTime <- pmin(tm, cens, 3)
eventInd <- as.numeric(tm <= pmin(cens, 3))
alpha <- function(b){ log((1 - exp(-2)) * (b - 2) / (2 * (exp(b - 2) - 1))) }
mark0 <- log(1 - (1 - exp(-2)) * runif(n / 2)) / (-2)
mark1 <- log(1 + (beta - 2) * (1 - exp(-2)) * runif(n / 2) / (2 * exp(alpha(beta)))) /
  (beta - 2)
mark <- ifelse(eventInd == 1, c(mark0, mark1), NA)
# the true TE(v) curve underlying the data-generating mechanism is:
# TE(v) = 1 - exp{alpha(beta) + beta * v + gamma}

# a binary auxiliary covariate
A <- sapply(exp(-0.5 - 0.2 * mark) / (1 + exp(-0.5 - 0.2 * mark)),
            function(p){ ifelse(is.na(p), NA, rbinom(1, 1, p)) })
linPred <- 1 + 0.4 * tx - 0.2 * A
probs <- exp(linPred) / (1 + exp(linPred))
R <- rep(NA, n)
while (sum(R, na.rm = TRUE) < 10){
  R[eventInd == 1] <- sapply(probs[eventInd == 1],
                             function(p){ rbinom(1, 1, p) })
}
# a missing-at-random mark
mark[eventInd == 1] <- ifelse(R[eventInd == 1] == 1, mark[eventInd == 1], NA)

# AIPW estimation; auxiliary covariate is used (not required)
fit <- kernel_sievePHaipw(eventTime, eventInd, mark, tx, aux = A,
                          auxType = "binary", formulaMiss = ~ eventTime,
                          formulaAux = ~ eventTime + tx + mark,
                          tau = 3, tband = 0.5, hband = 0.3, nvgrid = 20,
                          nboot = 20)

[Package sievePH version 1.1 Index]