caseCohortCoxSurvival {CaseCohortCoxSurvival}  R Documentation 
Function for estimating parameters (logrelative hazard, baseline hazards, cumulative baseline hazard, pure risks) and their variance (robust or the one accounting for sampling features) from cohort or casecohort data, under the Cox model.
caseCohortCoxSurvival(data, status, time,
cox.phase1=NULL, cox.phase2=NULL, other.covars=NULL,
strata=NULL, weights.phase2=NULL, calibrated=FALSE,
subcohort=NULL, subcohort.strata.counts=NULL,
predict=TRUE, predicted.cox.phase2=NULL,
predictors.cox.phase2=NULL,
aux.vars=NULL, aux.method="Shin",
phase3=NULL, strata.phase3=NULL,
weights.phase3=NULL, weights.phase3.type="both",
Tau1=NULL, Tau2=NULL, x=NULL,
weights.op=NULL, print=1)
data 
Data frame containing the cohort and all variables needed for the analysis. 
status 
Column name in 
time 
Column name(s) in 
cox.phase1 
Column name(s) in 
cox.phase2 
Column name(s) in 
other.covars 
Column name(s) in data giving other covariates
measured on the entire cohort that might be useful,
alone or in combination with 
strata 
NULL or column name in data with the stratum value for each individual in the cohort. The number of strata used for the sampling of the subcohort equals the number of different stratum values. For example, a stratum variable might take values 0,1,2,3 or 4. The default is NULL. 
weights.phase2 
NULL or column name in data giving the phasetwo design
weights for each individual in the cohort.
For a whole cohort analysis (see 
calibrated 
TRUE or FALSE to calibrate the 
subcohort 
NULL or column name in 
subcohort.strata.counts 
NULL or a list of the number of individuals sampled into the subcohort from each stratum of strata. The names in the list must be the strata values and the length of the list must be equal to the number of strata. If NULL, then the count for each stratum is estimated by the number of subcohort individuals in each stratum. The default is NULL. 
predict 
TRUE or FALSE to predict the phasetwo covariates using

predicted.cox.phase2 
NULL or a named list giving the predicted values of the
phasetwo covariates ( 
predictors.cox.phase2 
NULL, a vector, or a list specifying the columns in data
to use as predictor variables for obtaining the predicted values
on the whole cohort for the phasetwo covariates ( 
aux.vars 
NULL or column name(s) in data giving the auxiliary variables for
each individual in the cohort. This option is only used when

aux.method 
"Breslow", or "Shin" to specify the algorithm to construct the
auxiliary variables. This option is only used if 
phase3 
NULL or column name in data giving the indicators of membership in the in
the phasethree sample. The indicators are 1 if the individual belongs to the
phasethree sample and 0 otherwise. All individuals in the phasethree sample
must also belong to the phasetwo sample.
This option is not used if 
strata.phase3 
NULL or column name in 
weights.phase3 
NULL or column name in 
weights.phase3.type 
One of NULL, "design", "estimated", or "both" to specify whether the phasethree weights are design weights (known), or to be estimated. The variance estimation differs for estimated and design weights. If set to "both", then both variance estimates will be computed. If not NULL, then only the first letter is matched for this option. The default is "both". 
Tau1 
NULL or left bound of the time interval considered for the cumulative baseline hazard and the pure risk. If NULL, then the first event time is used. 
Tau2 
NULL or right bound of the time interval considered for the cumulative baseline hazard and the pure risk. If NULL, then the last event time is used. 
x 
Data frame containing 
weights.op 
NULL or a list of options for calibration of phasetwo design weights
or estimating phasethree design weights.
The available options are 
print 
03 to print information as the analysis is performed.
The larger the value, the more information will be printed. To not
print any information, set 
The different scenarios covered by the function are:
1) Whole cohort (subcohort = NULL
)
2) (stratified) casecohort (= stratified phasetwo sample with no missing covariate data)
a. With design weights (subcohort
, strata
, calibrated = FALSE
)
b. With calibrated weights and proxies to predict phasetwo covariates and the
auxiliary variables (subcohort
, strata
, calibrated=TRUE
,
predict=TRUE
, predictors.cox.phase2
, aux.method
)
c. With calibrated weights and externally supplied predicted values of phasetwo covariates
(calibrated=TRUE
, strata
, predict=FALSE
, predicted.cox.phase2
)
3) (unstratified) casecohort (= unstratified phasetwo sample with no missing covariate data)
a. With design weights (subcohort
, strata=NULL
, calibrated=FALSE
)
b. With calibrated weights and proxies to predict phasetwo covariates and obtain the
auxiliary variables (subcohort
, strata=NULL
, calibrated=TRUE
,
predict=TRUE
, predictors.cox.phase2
, aux.method
)
c. With calibrated weights and externally supplied predicted values of phasetwo covariates
(calibrated=TRUE
, strata=NULL
, predict=FALSE
, predicted.cox.phase2
)
4) Casecohort (= phasethree sample, because of missing covariate information in phasetwo
data, with stratified or unstratified phasetwo sampling)
a. With known phasethree design weights (subcohort
, strata
, phase3
,
strata.phase3
,
weights.phase3.type="design"
)
b. With estimated phasethree design weights (subcohort
, strata
, phase3
,
strata.phase3
,
weights.phase3.type="estimated"
)
covariates and prediction
Prediction of phasetwo covariates is performed when calibrated = TRUE
, predict = TRUE
,
aux.vars = NULL
and predicted.cox.phase2 = NULL
. If predictors.cox.phase2 = NULL
,
all the covariates measured on the entire cohort will be used for the prediction
(see cox.phase1
and other.covars
).
Prediction of phasetwo covariates is performed by linear regression for a continuous variable,
logistic regression for a binary variable and the function multinom
for a
categorical variable. Dummy variables should not be used for categorical covariates,
because independent logistic (or linear) regressions will be performed using the dummy variables.
Alternatively, predicted values of phasetwo covariates on the whole cohort can be specified with
predicted.cox.phase2
.
calibration
Calibrating the design weights against some informative auxiliary variables,
measured on all cohort members, can increase efficiency.
When calibrated = TRUE
, the user can either provide the auxiliary variables
(aux.vars
), or let the driver function build the auxiliary variables (aux.method
).
Construction of the auxiliary variables follows Breslow et al. (2009) or Shin et al. (2020)
(see aux.method
), and relies on predictions of the phasetwo covariates for all members
of the cohort (see covariates and prediction above).
The auxiliary variables are given by (i) the influences for the logrelative hazard parameters
estimated from the Cox model with imputed cohort data; and (ii) the products of total
followup time (on the time interval for which pure risk is to be estimated) with the estimated
relative hazard for the imputed cohort data, where the logrelative hazard parameters are
estimated from the Cox model with casecohort data and weights calibrated with (i).
When aux.method = Breslow
, calibration of the design weights is against (i),
as proposed by Breslow et al. (2009) to improve efficiency of casecohort estimates
of relative hazard. When aux.method = Shin
, calibration is against (i) and (ii),
as proposed by Shin et al. (2020) to improve efficiency of relative hazard and pure risk
estimates under the nested casecontrol design.
Note
If subcohort = NULL
, then a whole cohort analysis will be run and only robust variance estimates
will be computed.
A list with class casecohortcoxsurv
containing:
beta
Estimated logrelative hazard estimates
Lambda0
Cumulative baseline hazard estimate in [Tau1, Tau2]
beta.var
Influencebased variance estimate for beta
Lambda0.var
Influencebased variance estimate for Lambda0
beta.var.estimated
Influencebased variance estimate for beta
with estimated
phasethree weights
Lambda0.var.estimated
Influencebased variance estimate for Lambda0
with estimated
phasethree weights
beta.var.design
Influencebased variance estimate for beta
with design
phasethree weights
Lambda0.var.design
Influencebased variance estimate for Lambda0
with design
phasethree weights
beta.robustvar
Robust variance estimate for beta
Lambda0.robustvar
Robust variance estimate for Lambda0
beta.robustvar.estimated
Robust variance estimate for beta
with estimated
phasethree weights
Lambda0.robustvar.estimated
Robust variance estimate for Lambda0
with estimated
phasethree weights
beta.robustvar.design
Robust variance estimate for beta
with design
phasethree weights
Lambda0.robustvar.design
Robust variance estimate for Lambda0
with design
phasethree weights
Pi.var
Matrix of pure risk estimates in [Tau1, Tau2] and variance estimates
Pi.var.estimated
Matrix of pure risk estimates in [Tau1, Tau2] and variance estimates
with estimated phasethree weights
Pi.var.design
Matrix of pure risk estimates in [Tau1, Tau2] and variance estimates
with design phasethree weights
coxph.fit
Return object from coxph
of the model fit
changed.times
Matrix of original and new event times for individuals who had their event times
changed due to ties. Will be NULL if event times were not changed.
args
List containing the values of the input arguments (except data
)
risk.obj
List containing objects needed to compute pure risk estimates and variances
for a different set of data
Etievant, L., Gail, M.H. (2023). Cox model inference for relative hazard and pure risk from stratified weightcalibrated casecohort data. Submitted.
Shin Y.E., Pfeiffer R.M., Graubard B.I., Gail M.H. (2020) Weight calibration to improve the efficiency of pure risk estimates from casecontrol samples nested in a cohort. Biometrics, 76, 10871097
Breslow, N.E., Lumley, T., Ballantyne, C.M., Chambless, L.E. and Kulich, M. (2009). Improved HorvitzThompson Estimation of Model Parameters from Twophase Stratified Samples: Applications in Epidemiology. Statistics in Biosciences, 1, 3249.
data(dataexample.missingdata, package="CaseCohortCoxSurvival")
data < dataexample.missingdata$cohort
cov1 < "X1"
cov2 < c("X2", "X3")
# Whole cohort, get pure risk estimate for every individual's profile in the cohort
# Only robust variance estimates are computed for a whole cohort analysis.
caseCohortCoxSurvival(data, "status", "times", cox.phase1 = cov1, x = data)
# Stratified casecohort analysis with missing covariate information in
# phasetwo data, with phasethree strata
caseCohortCoxSurvival(data, "status", "times", cox.phase1 = cov1,
cox.phase2 = cov2, strata = "W", subcohort = "subcohort",
phase3 = "phase3", strata.phase3 = "W3")
data(dataexample, package="CaseCohortCoxSurvival")
data < dataexample$cohort
cov2 < c("X1", "X2", "X3")
# Stratified casecohort (phasetwo) analysis with weight calibration and default
# proxies to predict the phasetwo covariates.
caseCohortCoxSurvival(data, "status", "times", cox.phase2 = cov2, strata = "W",
subcohort = "subcohort", calibrated = TRUE)
# Stratified casecohort (phasetwo) analysis with weight calibration specifying
# a different set of proxy variables to predict each phasetwo covariate.
caseCohortCoxSurvival(data, "status", "times", cox.phase2 = cov2,
strata = "W", subcohort = "subcohort", calibrated = TRUE,
predictors.cox.phase2 = list(X1 = c("X1.proxy"),
X2 = c("X1.proxy", "X2.proxy", "W"), X3 = c("X1.proxy", "X3.proxy")))
# Stratified casecohort (phasetwo) analysis with weight calibration, get pure
# risk estimate for one given covariate profile.
caseCohortCoxSurvival(data, "status", "times", cox.phase2 = cov2,
strata = "W", subcohort = "subcohort", calibrated = TRUE,
predictors.cox.phase2=list(X1 = c("X1.proxy"),
X2 = c("X1.proxy", "X2.proxy", "W"), X3 = c("X1.proxy", "X3.proxy")),
x = list(X1 = 1, X2 = 1, X3 = 0.6), Tau1 = 0, Tau2 = 8)
# Set the correct sampling counts in phasetwo for each level of strata.
# The strata variable W has levels 03.
caseCohortCoxSurvival(data, "status", "times", cox.phase2 = cov2,
subcohort = "subcohort", calibrated = TRUE, strata = "W",
subcohort.strata.counts = list("0"=129, "1"=313, "2"=308, "3"=311))