gof {survMisc}R Documentation

goodness of fit test for a coxph object

Description

goodness of fit test for a coxph object

Usage

gof(x, ...)

## S3 method for class 'coxph'
gof(x, ..., G = NULL)

Arguments

x

An object of class coxph

...

Additional arguments (not implemented)

G

Number of groups into which to divide risk score. If G=NULL (the default), uses closest integer to

G=max(2,min(10,ne40))G = \max(2, \quad \min(10, \quad \frac{ne}{40}))

where nene is the number of events overall.

Details

In order to verify the overall goodness of fit, the risk score rir_i for each observation ii is given by

ri=β^Xir_i = \hat{\beta} X_i

where β^\hat{\beta} is the vector of fitted coefficients and XiX_i is the vector of predictor variables for observation ii.
This risk score is then sorted and 'lumped' into a grouping variable with GG groups, (containing approximately equal numbers of observations).
The number of observed (ee) and expected (expexp) events in each group are used to generate a ZZ statistic for each group, which is assumed to follow a normal distribution with ZN(0,1)Z \sim N(0,1).
The indicator variable indicG is added to the original model and the two models are compared to determine the improvement in fit via the likelihood ratio test.

Value

A list with elements:

groups

A data.table with one row per group GG. The columns are

n

Number of observations

e

Number of events

exp

Number of events expected. This is

exp=eiMiexp = \sum e_i - M_i

where eie_i are the events and MiM_i are the martingale residuals for each observation ii

z

ZZ score, calculated as

Z=eexpexp Z = \frac{e - exp}{\sqrt{exp}}

p

pp-value for ZZ, which is

p=2.pnorm(z) p = 2. \code{pnorm}(-|z|)

where pnorm is the normal distribution function with mean μ=0\mu =0 and standard deviation σ=1\sigma =1 and z|z| is the absolute value.

lrTest

Likelihood-ratio test. Tests the improvement in log-likelihood with addition of an indicator variable with G1G-1 groups. This is done with survival:::anova.coxph. The test is distributed as chi-square with G1G-1 degrees of freedom

Note

The choice of GG is somewhat arbitrary but rarely should be >10> 10.
As illustrated in the example, a larger value for GG makes the ZZ test for each group more likely to be significant. This does not affect the significance of adding the indicator variable indicG to the original model.

The ZZ score is chosen for simplicity, as for large sample sizes the Poisson distribution approaches the normal. Strictly speaking, the Poisson would be more appropriate for ee and expexp as per Counting Theory.
The ZZ score may be somewhat conservative as the expected events are calculated using the martingale residuals from the overall model, rather than by group. This is likely to bring the expected events closer to the observed events.

This test is similar to the Hosmer-Lemeshow test for logistic regression.

Source

Method and example are from:
May S, Hosmer DW 1998. A simplified method of calculating an overall goodness-of-fit test for the Cox proportional hazards model. Lifetime Data Analysis 4(2):109–20. doi:10.1023/A:1009612305785

References

Default value for GG as per:
May S, Hosmer DW 2004. A cautionary note on the use of the Gronnesby and Borgan goodness-of-fit test for the Cox proportional hazards model. Lifetime Data Analysis 10(3):283–91. doi:10.1023/B:LIDA.0000036393.29224.1d

Changes to the pbc dataset in the example are as detailed in:
Fleming T, Harrington D 2005. Counting Processes and Survival Analysis. New Jersey: Wiley and Sons. Chapter 4, section 4.6, pp 188. doi:10.1002/9781118150672

Examples

data("pbc", package="survival")
pbc <- pbc[!is.na(pbc$trt), ]
## make corrections as per Fleming
pbc[pbc$id==253, "age"] <-  54.4
pbc[pbc$id==107, "protime"] <-  10.7
### misspecified; should be log(bili) and log(protime) instead
c1 <- coxph(Surv(time, status==2) ~
            age + log(albumin) + bili + edema + protime,
            data=pbc)
gof(c1, G=10)
gof(c1)


[Package survMisc version 0.5.6 Index]