R: Weighted Estimation in Cox Regression

coxphw {coxphw}

R Documentation

Weighted Estimation in Cox Regression

Description

Weighted Cox regression as proposed by Schemper et al. (2009) doi:10.1002/sim.3623 provides unbiased estimates of average hazard ratios also in case of non-proportional hazards. Time-dependent effects can be conveniently estimated by including interactions of covariates with arbitrary functions of time, with or without making use of the weighting option.

Usage

coxphw(formula, data, template = c("AHR", "ARE", "PH"), subset, na.action,
       robust = TRUE, jack = FALSE, betafix = NULL, alpha = 0.05,
       trunc.weights = 1, control, caseweights, x = TRUE, y = TRUE,
       verbose = FALSE, sorted = FALSE, id = NULL, clusterid = NULL, ...)

Arguments

`formula`	a formula object with the response on the left of the operator and the model terms on the right. The response must be a survival object as returned by `Surv`. `formula` may include `offset`-terms or functions of time (see example).
`data`	a data frame in which to interpret the variables named in `formula`.
`template`	choose among three pre-defined templates: `"AHR"` requests estimation of average hazard ratios (Schemper et al., 2009), `"ARE"` requests estimation of average regression effects (Xu and O'Quigley, 2000) and `"PH"` requests Cox proportional hazards regression. Recommended and default template is `"AHR"`.
`subset`	expression indicating which subset of the rows of data should be used in the fit. All observations are included by default.
`na.action`	missing-data filtering. Defaults to `options$na.action`. Applied after subsetting data, but applied to the all variables in the data set (not only those listed in the formula).
`robust`	if set to TRUE, the robust covariance estimate (Lin-Wei) is used; otherwise the Lin-Sasieni covariance estimate is applied. Default is TRUE.
`jack`	if set to TRUE, the variance is based on a complete jackknife. Each individual (as identified by `id`) is left out in turn. The resulting matrix of DFBETA residuals D is then used to compute the variance matrix: V = D'D. Default is FALSE.
`betafix`	can be used to restrict the estimation of one or more regression coefficients to pre-defined values. A vector with one element for each model term as given in `formula` is expected (with an identical order as in `formula`). If estimation of a model term is requested, then the corresponding element in `betafix` has to be set to `NA`, otherwise it should be set to the fixed parameter value. The default value is `betafix = NULL`, yielding unrestricted estimation of all regression coefficients.
`alpha`	the significance level (1-`\alpha`), 0.05 as default.
`trunc.weights`	specifies a quantile at which the (combined normalized) weights are to be truncated. It can be used to increase the precision of the estimates, particularly if `template` `= "AHR"` or `"ARE"` is used. Default is 1 (no truncation). Recommended value is 0.95 for mild truncation.
`control`	Object of class `coxphw.control` specifying iteration limit and other control options. Default is `coxphw.control(...)`.
`caseweights`	vector of case weights, equivalent to `weights` in `coxph`. If `caseweights` is a vector of integers, then the estimated coefficients are equivalent to estimating the model from data with the individual cases replicated as many times as indicated by `caseweights`. These weights should not be confused with the weights in weighted Cox regression which account for the non-proportional hazards.
`x`	requests copying explanatory variables into the output object. Default is TRUE.
`y`	requests copying survival information into the output object. Default is TRUE.
`verbose`	requests echoing of intermediate results. Default is FALSE.
`sorted`	if set to TRUE, the data set will not be sorted prior to passing it to FORTRAN. This may speed up computations. Default is FALSE.
`id`	a vector of subject identification integer numbers starting from 1 used only if the data are in the counting process format. These IDs are used to compute the robust covariance matrix. If `id = NA` (the default) the program assumes that each line of the data set refers to a distinct subject.
`clusterid`	a vector of cluster identification integer numbers starting from 1. These IDs are used to compute the robust covariance matrix. If `clusterid = NA` (the default) the program assumes that no cluster exist.
`...`	additional arguments.

Details

If Cox's proportional hazards regression is used in the presence of non-proportional hazards, i.e., with underlying time-dependent hazard ratios of prognostic factors, the average relative risk for such a factor is under- or overestimated and testing power for the corresponding regression parameter is reduced. In such a situation weighted estimation provides a parsimonious alternative to more elaborate modelling of time-dependent effects. Weighted estimation in Cox regression extends the tests by Breslow and Prentice to a multi-covariate situation as does the Cox model to Mantel's logrank test. Weighted Cox regression can also be seen as a robust alternative to the standard Cox estimator, reducing the influence of outlying survival times on parameter estimates.

Three pre-defined templates can be requested:
1) "AHR", i.e., estimation of average hazard ratios (Schemper et al., 2009) using Prentice weights with censoring correction and robust variance estimation;
2) "ARE", i.e., estimation of average regression effects (Xu and O'Quigley, 2000) using censoring correction and robust variance estimation; or
3) "PH", i.e., Cox proportional hazards regression using robust variance estimation.

Breslow's tie-handling method is used by the program, other methods to handle ties are currently not available.

A fit of coxphw with template = "PH" will yield identical estimates as a fit of coxph using Breslow's tie handling method and robust variance estimation (using cluster).

If robust = FALSE, the program estimates the covariance matrix using the Lin (1991) and Sasieni (1993) sandwich estimate A^{-1}BA^{-1} with -A and -B denoting the sum of contributions to the second derivative of the log likelihood, weighted by w(t_j) and w(t_j)^2, respectively. This estimate is independent from the scaling of the weights and reduces to the inverse of the information matrix in case of no weighting. However, it is theoretically valid only in case of proportional hazards. Therefore, since application of weighted Cox regression usually implies a violated proportional hazards assumption, the robust Lin-Wei covariance estimate is used by default (robust = TRUE).

If some regression coefficients are held constant using betafix, no standard errors are given for these coefficients as they are not estimated in the model. The global Wald test only relates to those variables for which regression coefficients were estimated.

An offset term can be included in the formula of coxphw. In this way a variable can be specified which is included in the model but its parameter estimate is fixed at 1.

Value

A list with the following components:

`coefficients`	the parameter estimates.
`var`	the estimated covariance matrix.
`df`	the degrees of freedom.
`ci.lower`	the lower confidence limits of exp(beta).
`ci.upper`	the upper confidence limits of exp(beta).
`prob`	the p-values.
`linear.predictors`	the linear predictors.
`n`	the number of observations.
`dfbeta.resid`	matrix of DFBETA residuals.
`iter`	the number of iterations needed to converge.
`method.ties`	the ties handling method.
`PTcoefs`	matrix with scale and shift used for pretransformation of `fp()`-terms.
`cov.j`	the covariance matrix computed by the jackknife method (only computed if `jack = TRUE`).
`cov.lw`	the covariance matrix computed by the Lin-Wei method (robust covariance)
`cov.ls`	the covariance matrix computed by the Lin-Sasieni method.
`cov.method`	the method used to compute the (displayed) covariance matrix and the standard errors. This method is either "jack" if `jack = TRUE`, or "Lin-Wei" if `jack = FALSE`.
`w.matrix`	a matrix with four columns according to the number of uncensored failure times. The first column contains the failure times, the remaining columns (labeled `w.raw`, `w.obskm`, and `w`) contain the raw weights, the weights according to the inverse of the Kaplan-Meier estimates with reverse status indicator and the normalized product of both.
`caseweights`	if `x = TRUE` the case weights.
`Wald`	Wald-test statistics.
`means`	the means of the covariates.
`offset.values`	offset values.
`dataline`	the first dataline of the input data set (required for `plotfp`).
`x`	if `x = TRUE` the explanatory variables.
`y`	the response.
`alpha`	the significance level = 1 - confidence level.
`template`	the requested template.
`formula`	the model formula.
`betafix`	the `betafix` vector.
`call`	the function call.

Note

The SAS macro WCM with similar functionality is offered for download at https://cemsiis.meduniwien.ac.at/en/kb/science-research/software/statistical-software/wcmcoxphw/ .

Up to Version 2.13 coxphw used a slightly different syntax (arguments: AHR, AHR.norobust, ARE, PH, normalize, censcorr, prentice, breslow, taroneware). From Version 3.0.0 on the old syntax is disabled. From Version 4.0.0 estimation of fractional polynomials is disabled.

Author(s)

Georg Heinze, Meinhard Ploner, Daniela Dunkler

References

Dunkler D, Ploner M, Schemper M, Heinze G. (2018) Weighted Cox Regression Using the R Package coxphw. JSS 84, 1–26, doi:10.18637/jss.v084.i02.

Lin D and Wei L (1989). The Robust Inference for the Cox Proportional Hazards Model. J AM STAT ASSOC 84, 1074-1078.

Lin D (1991). Goodness-of-Fit Analysis for the Cox Regression Model Based on a Class of Parameter Estimators. J AM STAT ASSOC 86, 725-728.

Royston P and Altman D (1994). Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling. J R STAT SOC C-APPL 43, 429-467.

Royston P and Sauerbrei W (2008). Multivariable Model-Building. A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables. Wiley, Chichester, UK.

Sasieni P (1993). Maximum Weighted Partial Likelihood Estimators for the Cox Model. J AM STAT ASSOC 88, 144-152.

Schemper M (1992). Cox Analysis of Survival Data with Non-Proportional Hazard Functions. J R STAT SOC D 41, 455-465.

Schemper M, Wakounig S and Heinze G (2009). The Estimation of Average Hazard Ratios by Weighted Cox Regression. STAT MED 28, 2473-2489. doi:10.1002/sim.3623

Xu R and O'Quigley J (2000). Estimating Average Regression Effect Under Non-Proportional Hazards. Biostatistics 1, 423-439.

Examples

data("gastric")

# weighted estimation of average hazard ratio
fit1 <- coxphw(Surv(time, status) ~ radiation, data = gastric, template = "AHR")
summary(fit1)
fit1$cov.lw     # robust covariance
fit1$cov.ls     # Lin-Sasieni covariance


# unweighted estimation, include interaction with years
# ('radiation' must be included in formula!)
gastric$years <- gastric$time / 365.25
fit2 <- coxphw(Surv(years, status) ~ radiation + years : radiation, data = gastric,
               template = "PH")
summary(fit2)


# unweighted estimation with a function of time
data("gastric")
gastric$yrs <- gastric$time / 365.25

fun <- function(t) { (t > 1) * 1 }
fit3 <- coxphw(Surv(yrs, status) ~ radiation + fun(yrs):radiation, data = gastric,
               template = "PH")

# for more examples see vignette or predict.coxphw

[Package coxphw version 4.0.3 Index]