calibrationPlot {discSurv}R Documentation

Calibration Plots

Description

Calibration plot based on predictions. Overall root mean squared error (RMSE) of predicted and observed discrete hazards is calculated.

Usage

calibrationPlot(
  testPreds,
  testDataLong,
  weights = NULL,
  K = 10,
  event = "e1",
  ...
)

Arguments

testPreds

Predictions on the validation data with model fitted on training data ("numeric vector").

testDataLong

Validation data set in long format ("class data.frame").

weights

optional vector of weights ("numeric vector"). The length of weights must be equal to the number of observations of the validation data set.

K

Number of subsets for plotting ("integer vector").

event

Column names of the event to be considered for plotting (only in case of cause-specific hazards) ("character vector").

...

Additional arguments passed to plot.

Value

Calibration plot

Author(s)

Moritz Berger moritz.berger@imbie.uni-bonn.de
https://www.imbie.uni-bonn.de/personen/dr-moritz-berger/

References

Berger M, Schmid M (2018). “Semiparametric regression for discrete time-to-event data.” Statistical Modelling, 18, 322–345.

Heyard R, Timsit J, Held L, COMBACTE-MAGNET,consortium (2019). “Validation of discrete time-to-event prediction models in the presence of competing risks.” Biometrical Journal, 62, 643-657.

Berger M, Schmid M (2020). “Assessing the calibration of subdistribution hazard models in discrete time.” arXiv:2001.11240.

See Also

estRecal, dataLong, dataLongCompRisks, dataLongSubDist

Examples

####################
# Data preprocessing

# Example unemployment data
library(Ecdat)
data(UnempDur)

# Select subsample
selectInd1 <- 1:100
selectInd2 <- 101:200
trainSet <- UnempDur[which(UnempDur$spell %in% (1:10))[selectInd1], ]
valSet <- UnempDur[which(UnempDur$spell %in% (1:10))[selectInd2], ]

####################
# One event

# Convert to long format
trainSet_long <- dataLong(dataShort = trainSet, timeColumn = "spell", eventColumn = "censor1")
valSet_long <- dataLong(dataShort = valSet, timeColumn = "spell", eventColumn = "censor1")

# Estimate continuation ratio model with logit link
glmFit <- glm(formula = y ~ timeInt + age + logwage, data = trainSet_long, family = binomial())

# Calculate predicted hazards
predHazards <- predict(glmFit, newdata = valSet_long, type = "response")

# Calibration plot
calibrationPlot(predHazards, testDataLong = valSet_long)

############################
# Two cause specific hazards 

# Convert to long format
trainSet_long <- dataLongCompRisks(dataShort = trainSet, timeColumn = "spell", 
eventColumns = c("censor1", "censor4"))
valSet_long <- dataLongCompRisks(dataShort = valSet, timeColumn = "spell", 
eventColumns = c("censor1", "censor4"))

# Estimate continuation ratio model with logit link
vglmFit <- VGAM::vglm(formula = cbind(e0, e1, e2) ~ timeInt + age + logwage, data = trainSet_long, 
family = VGAM::multinomial(refLevel = "e0"))

# Calculate predicted hazards
predHazards <- VGAM::predictvglm(vglmFit, newdata = valSet_long, type = "response")

# Calibration plots
calibrationPlot(predHazards, testDataLong = valSet_long)
calibrationPlot(predHazards, testDataLong = valSet_long, event = "e2")

###############################
# Subdistribution hazards model

# Convert to long format
trainSet_long <- dataLongSubDist(dataShort = trainSet, timeColumn = "spell", 
eventColumns = c("censor1", "censor4"), eventFocus = "censor1")
valSet_long <- dataLongSubDist(dataShort = valSet, timeColumn = "spell", 
eventColumns = c("censor1", "censor4"), eventFocus = "censor1")

# Estimate continuation ratio model with logit link
glmFit <- glm(formula = y ~ timeInt + age + logwage, data = trainSet_long, 
family = binomial(), weights = trainSet_long$subDistWeights)

# Calculate predicted  hazards 
predHazards <- predict(glmFit, newdata = valSet_long, type = "response")

# Calibration plot 
calibrationPlot(predHazards, testDataLong = valSet_long, weights = valSet_long$subDistWeights)


[Package discSurv version 2.0.0 Index]