R: Regression standardization in Cox proportional hazards models

stdCoxph {stdReg}

R Documentation

Regression standardization in Cox proportional hazards models

Description

stdCoxph performs regression standardization in Cox proportional hazards models, at specified values of the exposure, over the sample covariate distribution. Let T, X, and Z be the survival outcome, the exposure, and a vector of covariates, respectively. stdCoxph uses a fitted Cox proportional hazards model to estimate the standardized survival function \theta(t,x)=E\{S(t|X=x,Z)\}, where t is a specific value of T, x is a specific value of X, and the expectation is over the marginal distribution of Z.

Usage

stdCoxph(fit, data, X, x, t, clusterid, subsetnew)

Arguments

`fit`	an object of class `"coxph"`, as returned by the `coxph` function in the survival package, but without special terms `strata`, `cluster` or `tt`. Only `breslow` method for handling ties is allowed. If arguments `weights` and/or `subset` are used when fitting the model, then the same weights and subset are used in `stdGlm`.
`data`	a data frame containing the variables in the model. This should be the same data frame as was used to fit the model in `fit`.
`X`	a string containing the name of the exposure variable `X` in `data`.
`x`	an optional vector containing the specific values of `X` at which to estimate the standardized survival function. If `X` is binary (0/1) or a factor, then `x` defaults to all values of `X`. If `X` is numeric, then `x` defaults to the mean of `X`. If `x` is set to `NA`, then `X` is not altered. This produces an estimate of the marginal survival function `S(t)=E\{S(t\|X,Z)\}`.
`t`	an optional vector containing the specific values of `T` at which to estimate the standardized survival function. It defaults to all the observed event times in `data`.
`clusterid`	an optional string containing the name of a cluster identification variable when data are clustered.
`subsetnew`	an optional logical statement specifying a subset of observations to be used in the standardization. This set is assumed to be a subset of the subset (if any) that was used to fit the regression model.

Details

stdCoxph assumes that a Cox proportional hazards model

\lambda(t|X,Z)=\lambda_0(t)exp\{h(X,Z;\beta)\}

has been fitted. Breslow's estimator of the cumulative baseline hazard \Lambda_0(t)=\int_0^t\lambda_0(u)du is used together with the partial likelihood estimate of \beta to obtain estimates of the survival function S(t|X=x,Z):

\hat{S}(t|X=x,Z)=exp[-\hat{\Lambda}_0(t)exp\{h(X=x,Z;\hat{\beta})\}].

For each t in the t argument and for each x in the x argument, these estimates are averaged across all subjects (i.e. all observed values of Z) to produce estimates

\hat{\theta}(t,x)=\sum_{i=1}^n \hat{S}(t|X=x,Z_i)/n,

where Z_i is the value of Z for subject i, i=1,...,n. The variance for \hat{\theta}(t,x) is obtained by the sandwich formula.

Value

An object of class "stdCoxph" is a list containing

`call`	the matched call.
`input`	`input` is a list containing all input arguments.
`est`	a matrix with `length(t)` rows and `length(x)` columns, where the element on row `i` and column `j` is equal to `\hat{\theta}`(`t[i],x[j]`).
`vcov`	a list with `length(t)` elements. Each element is a square matrix with `length(x)` rows. In the `k:th` matrix, the element on row `i` and column `j` is the (estimated) covariance of `\hat{\theta}`(`t[k]`,`x[i]`) and `\hat{\theta}`(`t[k]`,`x[j]`).

Note

Standardized survival functions are sometimes referred to as (direct) adjusted survival functions in the literature.

stdCoxph does not currently handle time-varying exposures or covariates.

stdCoxph internally loops over all values in the t argument. Therefore, the function will usually be considerably faster if length(t) is small.

The variance calculation performed by stdCoxph does not condition on the observed covariates \bar{Z}=(Z_1,...,Z_n). To see how this matters, note that

var\{\hat{\theta}(t,x)\}=E[var\{\hat{\theta}(t,x)|\bar{Z}\}]+var[E\{\hat{\theta}(t,x)|\bar{Z}\}].

The usual parameter \beta in a Cox proportional hazards model does not depend on \bar{Z}. Thus, E(\hat{\beta}|\bar{Z}) is independent of \bar{Z} as well (since E(\hat{\beta}|\bar{Z})=\beta), so that the term var[E\{\hat{\beta}|\bar{Z}\}] in the corresponding variance decomposition for var(\hat{\beta}) becomes equal to 0. However, \theta(t,x) depends on \bar{Z} through the average over the sample distribution for Z, and thus the term var[E\{\hat{\theta}(t,x)|\bar{Z}\}] is not 0, unless one conditions on \bar{Z}. The variance calculation by Gail and Byar (1986) ignores this term, and thus effectively conditions on \bar{Z}.

Author(s)

Arvid Sjolander

References

Chang I.M., Gelman G., Pagano M. (1982). Corrected group prognostic curves and summary statistics. Journal of Chronic Diseases 35, 669-674.

Gail M.H. and Byar D.P. (1986). Variance calculations for direct adjusted survival curves, with applications to testing for no treatment effect. Biometrical Journal 28(5), 587-599.

Makuch R.W. (1982). Adjusted survival curve estimation using covariates. Journal of Chronic Diseases 35, 437-443.

Sjolander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31(6), 563-574.

Sjolander A. (2016). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33(9), 847-858.

Examples


require(survival)

n <- 1000
Z <- rnorm(n)
X <- rnorm(n, mean=Z)
T <- rexp(n, rate=exp(X+Z+X*Z)) #survival time
C <- rexp(n, rate=exp(X+Z+X*Z)) #censoring time
U <- pmin(T, C) #time at risk
D <- as.numeric(T < C) #event indicator
dd <- data.frame(Z, X, U, D)
fit <- coxph(formula=Surv(U, D)~X+Z+X*Z, data=dd, method="breslow")
fit.std <- stdCoxph(fit=fit, data=dd, X="X", x=seq(-1,1,0.5), t=1:5)
print(summary(fit.std, t=3))
plot(fit.std)

[Package stdReg version 3.4.1 Index]