R: Concordance index

cIndex {discSurv}

R Documentation

Concordance index

Description

Calculates the concordance index for discrete survival models, which does not depend on time. This is the probability that, for a pair of randomly chosen comparable samples, the sample with the higher risk prediction will experience an event before the other sample or belongs to a higher binary class.

Usage

cIndex(marker, testTime, testEvent, trainTime, trainEvent)

Arguments

`marker`	Gives the predicted values of the linear predictor of a regression model ("numeric vector"). May also be on the response scale.
`testTime`	New time intervals in the test data ("integer vector").
`testEvent`	Event indicators in the test data ("binary vector").
`trainTime`	Time intervals in the training data ("integer vector").
`trainEvent`	Event indicators in the training data ("binary vector").

Value

Value of discrete concordance index between zero and one ("numeric vector").

Note

It is assumed that all time points up to the last observed interval [a_q-1, a_q) are available.

Author(s)

Thomas Welchowski welchow@imbie.meb.uni-bonn.de

References

Schmid M, Tutz G, Welchowski T (2018). “Discrimination Measures for Discrete Time-to-Event Predictions.” Econometrics and Statistics, 7, 153-164.

Uno H, Cai T, Tian L, Wei LJ (2012). “Evaluating Prediction Rules fort-Year Survivors With Censored Regression Models.” Journal of the American Statistical Association, 102, 527-537.

Heagerty PJ, Zheng Y (2005). “Survival Model Predictive Accuracy and ROC Curves.” Biometrics, 61, 92-105.

Examples


##################################################
# Example with unemployment data and prior fitting

library(Ecdat)
library(caret)
library(mgcv)
data(UnempDur)
summary(UnempDur$spell)
# Extract subset of data
set.seed(635)
IDsample <- sample(1:dim(UnempDur)[1], 100)
UnempDurSubset <- UnempDur [IDsample, ]
set.seed(-570)
TrainingSample <- sample(1:100, 75)
UnempDurSubsetTrain <- UnempDurSubset [TrainingSample, ]
UnempDurSubsetTest <- UnempDurSubset [-TrainingSample, ]

# Convert to long format
UnempDurSubsetTrainLong <- dataLong(dataShort = UnempDurSubsetTrain, 
timeColumn = "spell", eventColumn = "censor1")

# Estimate gam with smooth baseline
gamFit <- gam(formula = y ~ s(I(as.numeric(as.character(timeInt)))) + 
s(age) + s(logwage), data = UnempDurSubsetTrainLong, family = binomial())
gamFitPreds <- predict(gamFit, newdata = cbind(UnempDurSubsetTest, 
timeInt = UnempDurSubsetTest$spell))

# Evaluate C-Index based on short data format
cIndex(marker = gamFitPreds, 
testTime = UnempDurSubsetTest$spell, 
testEvent = UnempDurSubsetTest$censor1, 
trainTime = UnempDurSubsetTrain$spell, 
trainEvent = UnempDurSubsetTrain$censor1)

#####################################
# Example National Wilm's Tumor Study

library(survival)
head(nwtco)
summary(nwtco$rel)

# Select subset
set.seed(-375)
Indices <- sample(1:dim(nwtco)[1], 500)
nwtcoSub <- nwtco [Indices, ]

# Convert time range to 30 intervals
intLim <- quantile(nwtcoSub$edrel, prob = seq(0, 1, length.out = 30))
intLim [length(intLim)] <- intLim [length(intLim)] + 1
nwtcoSubTemp <- contToDisc(dataShort = nwtcoSub, timeColumn = "edrel", intervalLimits = intLim)
nwtcoSubTemp$instit <- factor(nwtcoSubTemp$instit)
nwtcoSubTemp$histol <- factor(nwtcoSubTemp$histol)
nwtcoSubTemp$stage <- factor(nwtcoSubTemp$stage)

# Split in training and test sample
set.seed(-570)
TrainingSample <- sample(1:dim(nwtcoSubTemp)[1], round(dim(nwtcoSubTemp)[1]*0.75))
nwtcoSubTempTrain <- nwtcoSubTemp [TrainingSample, ]
nwtcoSubTempTest <- nwtcoSubTemp [-TrainingSample, ]

# Convert to long format
nwtcoSubTempTrainLong <- dataLong(dataShort = nwtcoSubTempTrain, 
timeColumn = "timeDisc", eventColumn = "rel", timeAsFactor=TRUE)

# Estimate glm
inputFormula <- y ~ timeInt + histol + instit + stage
glmFit <- glm(formula = inputFormula, data = nwtcoSubTempTrainLong, family = binomial())
linPreds <- predict(glmFit, newdata = cbind(nwtcoSubTempTest, 
timeInt = factor(nwtcoSubTempTest$timeDisc, levels=levels(nwtcoSubTempTrainLong$timeInt))))

# Evaluate C-Index based on short data format
cIndex(marker = linPreds, 
testTime = as.numeric(as.character(nwtcoSubTempTest$timeDisc)), 
testEvent = nwtcoSubTempTest$rel, 
trainTime = as.numeric(as.character(nwtcoSubTempTrain$timeDisc)), 
trainEvent = nwtcoSubTempTrain$rel)

[Package discSurv version 2.0.0 Index]