R: Parameter estimation and variance for case-cohort analyses...

caseCohortCoxSurvival {CaseCohortCoxSurvival}

R Documentation

Parameter estimation and variance for case-cohort analyses under the Cox model

Description

Function for estimating parameters (log-relative hazard, baseline hazards, cumulative baseline hazard, pure risks) and their variance (robust or the one accounting for sampling features) from cohort or case-cohort data, under the Cox model.

Usage

 caseCohortCoxSurvival(data, status, time, 
                       cox.phase1=NULL, cox.phase2=NULL, other.covars=NULL, 
                       strata=NULL, weights.phase2=NULL, calibrated=FALSE, 
                       subcohort=NULL, subcohort.strata.counts=NULL,
                       predict=TRUE, predicted.cox.phase2=NULL,
                       predictors.cox.phase2=NULL,
                       aux.vars=NULL, aux.method="Shin",
                       phase3=NULL, strata.phase3=NULL,
                       weights.phase3=NULL, weights.phase3.type="both",
                       Tau1=NULL, Tau2=NULL, x=NULL,
                       weights.op=NULL, print=1)

Arguments

`data`	Data frame containing the cohort and all variables needed for the analysis.
`status`	Column name in `data` giving the case status for each individual in the cohort. This variable must be coded as 0 for non-cases and 1 for cases.
`time`	Column name(s) in `data` giving the time to event for each individual in the case-cohort. One variable is required for a time-on-study time scale, two variables for age-scale, with the first variable as the start age and second as the end age.
`cox.phase1`	Column name(s) in `data` giving the Cox model covariates measured on the entire cohort. See covariates and prediction in details.
`cox.phase2`	Column name(s) in `data` giving the Cox model covariates measured only on phase-two individuals. See covariates and prediction in details.
`other.covars`	Column name(s) in data giving other covariates measured on the entire cohort that might be useful, alone or in combination with `cox.phase1`, if predicted values of the phase-two covariates (`cox.phase2`) need to be obtained on the whole cohort for the weight calibration.
`strata`	NULL or column name in data with the stratum value for each individual in the cohort. The number of strata used for the sampling of the subcohort equals the number of different stratum values. For example, a stratum variable might take values 0,1,2,3 or 4. The default is NULL.
`weights.phase2`	NULL or column name in data giving the phase-two design weights for each individual in the cohort. For a whole cohort analysis (see `subcohort` below), weights are not used in the `coxph` call. If NULL but subcohort is not NULL, `subcohort.strata.counts` will be used to estimate `weights.phase2`. The default is NULL.
`calibrated`	TRUE or FALSE to calibrate the `weights`. Calibrated weights will be computed using the function `calibration`. If TRUE, then `phase3` (below) will be set to NULL. See calibration in details. The default is FALSE.
`subcohort`	NULL or column name in `data` giving the indicators of membership in the subcohort. The indicators are 1 if the individual belongs to the subcohort and 0 otherwise. Some cases might be in the subcohort and others not. If NULL, then a whole cohort analysis will be performed. The default is NULL.
`subcohort.strata.counts`	NULL or a list of the number of individuals sampled into the subcohort from each stratum of strata. The names in the list must be the strata values and the length of the list must be equal to the number of strata. If NULL, then the count for each stratum is estimated by the number of subcohort individuals in each stratum. The default is NULL.
`predict`	TRUE or FALSE to predict the phase-two covariates using `predictors.cox.phase2`. This option is not used if `calibrated=FALSE`. If `calibrated=TRUE`, `aux.vars=NULL` and `predict=FALSE`, then `predicted.cox.phase2` must be specified. See covariates and prediction in details. This option is only used when `calibrated=TRUE`, `aux.vars=NULL` and `predicted.cox.phase2=NULL`. The default is TRUE.
`predicted.cox.phase2`	NULL or a named list giving the predicted values of the phase-two covariates (`cox.phase2`) on the whole cohort. For example, if the phase-two covariates are `X1` and `X2`, then the list is of the form `list(X1=X1.pred, X2=X2.pred)`, where `X1.pred` and `X2.pred` are the predictions of `X1` and `X2` respectively. This option is only used when `calibrated=TRUE` and `aux.vars=NULL`. If `calibrated=TRUE`, `aux.vars=NULL` and `predict=FALSE`, then `predicted.cox.phase2` must be specified and must not contain missing values. The default is NULL.
`predictors.cox.phase2`	NULL, a vector, or a list specifying the columns in data to use as predictor variables for obtaining the predicted values on the whole cohort for the phase-two covariates (`cox.phase2`). A list allows for different proxy variables to be used for the different phase-two covariates. The selected predictor variables must be from among `cox.phase1` and `other.covars`. See examples and covariates and prediction in details. If NULL, then the phase-two covariates will be predicted using `cox.phase1` and `other.covars`. If NULL, `cox.phase1=NULL` and `other.covars=NULL`, then the calibrated analysis will not be performed. This option is only used when `calibrated=TRUE`, `aux.vars=NULL`, `predicted.cox.phase2=NULL` and `predict=TRUE`. The default is NULL.
`aux.vars`	NULL or column name(s) in data giving the auxiliary variables for each individual in the cohort. This option is only used when `calibrated=TRUE`. If NULL, then auxiliary variables will be constructed using method Breslow or Shin and predicted values on the whole cohort for the phase-two covariates (see `aux.method`, `predict`, `predicted.cox.phase2` and `predictors.cox.phase2`). `aux.vars` must not contain missing values. The default is NULL.
`aux.method`	"Breslow", or "Shin" to specify the algorithm to construct the auxiliary variables. This option is only used if `aux.vars=NULL` and `calibrated=TRUE`. The default is "Shin".
`phase3`	NULL or column name in data giving the indicators of membership in the in the phase-three sample. The indicators are 1 if the individual belongs to the phase-three sample and 0 otherwise. All individuals in the phase-three sample must also belong to the phase-two sample. This option is not used if `calibrated=TRUE`. The default is NULL.
`strata.phase3`	NULL or column name in `data` giving the phase-three stratification for each individual in phase-two. The number of strata used for the third phase of sampling equals the number of different stratum values. The default is NULL.
`weights.phase3`	NULL or column name in `data` giving the phase-three design weights for each individual in phase-two. If NULL but `phase3` is not NULL, then `phase3` and `subcohort` will be used to estimate `weights.phase3` (see details in `estimation.weights.phase3`). The default is NULL.
`weights.phase3.type`	One of NULL, "design", "estimated", or "both" to specify whether the phase-three weights are design weights (known), or to be estimated. The variance estimation differs for estimated and design weights. If set to "both", then both variance estimates will be computed. If not NULL, then only the first letter is matched for this option. The default is "both".
`Tau1`	NULL or left bound of the time interval considered for the cumulative baseline hazard and the pure risk. If NULL, then the first event time is used.
`Tau2`	NULL or right bound of the time interval considered for the cumulative baseline hazard and the pure risk. If NULL, then the last event time is used.
`x`	Data frame containing `cox.phase1` and `cox.phase2` variables for which pure risk is estimated. The default is NULL so that no pure risk estimates will be computed.
`weights.op`	NULL or a list of options for calibration of phase-two design weights or estimating phase-three design weights. The available options are `niter.max`, and `epsilon.stop` (see `calibration` or `estimation.weights.phase3`). The default is NULL.
`print`	0-3 to print information as the analysis is performed. The larger the value, the more information will be printed. To not print any information, set `print = 0`. The default is 1.

Details

The different scenarios covered by the function are:
1) Whole cohort (subcohort = NULL)

2) (stratified) case-cohort (= stratified phase-two sample with no missing covariate data)
a. With design weights (subcohort, strata, calibrated = FALSE)
b. With calibrated weights and proxies to predict phase-two covariates and the auxiliary variables (subcohort, strata, calibrated=TRUE, predict=TRUE, predictors.cox.phase2, aux.method)
c. With calibrated weights and externally supplied predicted values of phase-two covariates (calibrated=TRUE, strata, predict=FALSE, predicted.cox.phase2)

3) (unstratified) case-cohort (= unstratified phase-two sample with no missing covariate data)
a. With design weights (subcohort, strata=NULL, calibrated=FALSE)
b. With calibrated weights and proxies to predict phase-two covariates and obtain the auxiliary variables (subcohort, strata=NULL, calibrated=TRUE, predict=TRUE, predictors.cox.phase2, aux.method)
c. With calibrated weights and externally supplied predicted values of phase-two covariates (calibrated=TRUE, strata=NULL, predict=FALSE, predicted.cox.phase2)

4) Case-cohort (= phase-three sample, because of missing covariate information in phase-two data, with stratified or unstratified phase-two sampling)
a. With known phase-three design weights (subcohort, strata, phase3, strata.phase3,
weights.phase3.type="design")
b. With estimated phase-three design weights (subcohort, strata, phase3, strata.phase3,
weights.phase3.type="estimated")

covariates and prediction
Prediction of phase-two covariates is performed when calibrated = TRUE, predict = TRUE, aux.vars = NULL and predicted.cox.phase2 = NULL. If predictors.cox.phase2 = NULL, all the covariates measured on the entire cohort will be used for the prediction (see cox.phase1 and other.covars). Prediction of phase-two covariates is performed by linear regression for a continuous variable, logistic regression for a binary variable and the function multinom for a categorical variable. Dummy variables should not be used for categorical covariates, because independent logistic (or linear) regressions will be performed using the dummy variables.
Alternatively, predicted values of phase-two covariates on the whole cohort can be specified with predicted.cox.phase2.

calibration
Calibrating the design weights against some informative auxiliary variables, measured on all cohort members, can increase efficiency. When calibrated = TRUE, the user can either provide the auxiliary variables (aux.vars), or let the driver function build the auxiliary variables (aux.method). Construction of the auxiliary variables follows Breslow et al. (2009) or Shin et al. (2020) (see aux.method), and relies on predictions of the phase-two covariates for all members of the cohort (see covariates and prediction above). The auxiliary variables are given by (i) the influences for the log-relative hazard parameters estimated from the Cox model with imputed cohort data; and (ii) the products of total follow-up time (on the time interval for which pure risk is to be estimated) with the estimated relative hazard for the imputed cohort data, where the log-relative hazard parameters are estimated from the Cox model with case-cohort data and weights calibrated with (i). When aux.method = Breslow, calibration of the design weights is against (i), as proposed by Breslow et al. (2009) to improve efficiency of case-cohort estimates of relative hazard. When aux.method = Shin, calibration is against (i) and (ii), as proposed by Shin et al. (2020) to improve efficiency of relative hazard and pure risk estimates under the nested case-control design.

Note
If subcohort = NULL, then a whole cohort analysis will be run and only robust variance estimates will be computed.

Value

A list with class casecohortcoxsurv containing:

beta Estimated log-relative hazard estimates
Lambda0 Cumulative baseline hazard estimate in [Tau1, Tau2]
beta.var Influence-based variance estimate for beta
Lambda0.var Influence-based variance estimate for Lambda0
beta.var.estimated Influence-based variance estimate for beta with estimated phase-three weights
Lambda0.var.estimated Influence-based variance estimate for Lambda0 with estimated phase-three weights
beta.var.design Influence-based variance estimate for beta with design phase-three weights
Lambda0.var.design Influence-based variance estimate for Lambda0 with design phase-three weights
beta.robustvar Robust variance estimate for beta
Lambda0.robustvar Robust variance estimate for Lambda0
beta.robustvar.estimated Robust variance estimate for beta with estimated phase-three weights
Lambda0.robustvar.estimated Robust variance estimate for Lambda0 with estimated phase-three weights
beta.robustvar.design Robust variance estimate for beta with design phase-three weights
Lambda0.robustvar.design Robust variance estimate for Lambda0 with design phase-three weights
Pi.var Matrix of pure risk estimates in [Tau1, Tau2] and variance estimates
Pi.var.estimated Matrix of pure risk estimates in [Tau1, Tau2] and variance estimates with estimated phase-three weights
Pi.var.design Matrix of pure risk estimates in [Tau1, Tau2] and variance estimates with design phase-three weights
coxph.fit Return object from coxph of the model fit
changed.times Matrix of original and new event times for individuals who had their event times changed due to ties. Will be NULL if event times were not changed.
args List containing the values of the input arguments (except data)
risk.obj List containing objects needed to compute pure risk estimates and variances for a different set of data

References

Etievant, L., Gail, M.H. (2023). Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data. Submitted.

Shin Y.E., Pfeiffer R.M., Graubard B.I., Gail M.H. (2020) Weight calibration to improve the efficiency of pure risk estimates from case-control samples nested in a cohort. Biometrics, 76, 1087-1097

Breslow, N.E., Lumley, T., Ballantyne, C.M., Chambless, L.E. and Kulich, M. (2009). Improved Horvitz-Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology. Statistics in Biosciences, 1, 32-49.

Examples

  data(dataexample.missingdata, package="CaseCohortCoxSurvival")

  data <- dataexample.missingdata$cohort
  cov1 <- "X1"
  cov2 <- c("X2", "X3")

  # Whole cohort, get pure risk estimate for every individual's profile in the cohort
  # Only robust variance estimates are computed for a whole cohort analysis.
  caseCohortCoxSurvival(data, "status", "times", cox.phase1 = cov1, x = data)

  # Stratified case-cohort analysis with missing covariate information in
  #   phase-two data, with phase-three strata
  caseCohortCoxSurvival(data, "status", "times", cox.phase1 = cov1, 
                      cox.phase2 = cov2, strata = "W", subcohort = "subcohort",  
                      phase3 = "phase3", strata.phase3 = "W3")
           
           
  data(dataexample, package="CaseCohortCoxSurvival")

  data <- dataexample$cohort
  cov2 <- c("X1", "X2", "X3")

  # Stratified case-cohort (phase-two) analysis with weight calibration and default 
  #   proxies to predict the phase-two covariates.
  caseCohortCoxSurvival(data, "status", "times", cox.phase2 = cov2, strata = "W", 
                        subcohort = "subcohort", calibrated = TRUE)

  

  # Stratified case-cohort (phase-two) analysis with weight calibration specifying 
  #  a different set of proxy variables to predict each phase-two covariate.
  caseCohortCoxSurvival(data, "status", "times", cox.phase2 = cov2, 
                      strata = "W", subcohort = "subcohort", calibrated = TRUE, 
                      predictors.cox.phase2 = list(X1 = c("X1.proxy"), 
                      X2 = c("X1.proxy", "X2.proxy", "W"), X3 = c("X1.proxy", "X3.proxy")))

  # Stratified case-cohort (phase-two) analysis with weight calibration, get pure
  #   risk estimate for one given covariate profile.
  caseCohortCoxSurvival(data, "status", "times", cox.phase2 = cov2, 
                      strata = "W", subcohort = "subcohort", calibrated = TRUE, 
                      predictors.cox.phase2=list(X1 = c("X1.proxy"), 
                      X2 = c("X1.proxy", "X2.proxy", "W"), X3 = c("X1.proxy", "X3.proxy")),
                      x = list(X1 = 1, X2 = 1, X3 = 0.6), Tau1 = 0, Tau2 = 8)

  # Set the correct sampling counts in phase-two for each level of strata.
  # The strata variable W has levels 0-3.
  caseCohortCoxSurvival(data, "status", "times", cox.phase2 = cov2, 
                      subcohort = "subcohort", calibrated = TRUE, strata = "W",
                      subcohort.strata.counts = list("0"=129, "1"=313, "2"=308, "3"=311))

[Package CaseCohortCoxSurvival version 0.0.34 Index]