calibrationPlot {discSurv} | R Documentation |
Calibration Plots
Description
Calibration plot based on predictions. Overall root mean squared error (RMSE) of predicted and observed discrete hazards is calculated.
Usage
calibrationPlot(
testPreds,
testDataLong,
weights = NULL,
K = 10,
event = "e1",
...
)
Arguments
testPreds |
Predictions on the validation data with model fitted on training data ("numeric vector"). |
testDataLong |
Validation data set in long format ("class data.frame"). |
weights |
optional vector of weights ("numeric vector"). The length of weights must be equal to the number of observations of the validation data set. |
K |
Number of subsets for plotting ("integer vector"). |
event |
Column names of the event to be considered for plotting (only in case of cause-specific hazards) ("character vector"). |
... |
Additional arguments passed to |
Value
Calibration plot
Author(s)
Moritz Berger moritz.berger@imbie.uni-bonn.de
https://www.imbie.uni-bonn.de/personen/dr-moritz-berger/
References
Berger M, Schmid M (2018).
“Semiparametric regression for discrete time-to-event data.”
Statistical Modelling, 18, 322–345.
Heyard R, Timsit J, Held L, COMBACTE-MAGNET,consortium (2019).
“Validation of discrete time-to-event prediction models in the presence of competing risks.”
Biometrical Journal, 62, 643-657.
Berger M, Schmid M (2020).
“Assessing the calibration of subdistribution hazard models in discrete time.”
arXiv:2001.11240.
See Also
estRecal
, dataLong
, dataLongCompRisks
, dataLongSubDist
Examples
####################
# Data preprocessing
# Example unemployment data
library(Ecdat)
data(UnempDur)
# Select subsample
selectInd1 <- 1:100
selectInd2 <- 101:200
trainSet <- UnempDur[which(UnempDur$spell %in% (1:10))[selectInd1], ]
valSet <- UnempDur[which(UnempDur$spell %in% (1:10))[selectInd2], ]
####################
# One event
# Convert to long format
trainSet_long <- dataLong(dataShort = trainSet, timeColumn = "spell", eventColumn = "censor1")
valSet_long <- dataLong(dataShort = valSet, timeColumn = "spell", eventColumn = "censor1")
# Estimate continuation ratio model with logit link
glmFit <- glm(formula = y ~ timeInt + age + logwage, data = trainSet_long, family = binomial())
# Calculate predicted hazards
predHazards <- predict(glmFit, newdata = valSet_long, type = "response")
# Calibration plot
calibrationPlot(predHazards, testDataLong = valSet_long)
############################
# Two cause specific hazards
# Convert to long format
trainSet_long <- dataLongCompRisks(dataShort = trainSet, timeColumn = "spell",
eventColumns = c("censor1", "censor4"))
valSet_long <- dataLongCompRisks(dataShort = valSet, timeColumn = "spell",
eventColumns = c("censor1", "censor4"))
# Estimate continuation ratio model with logit link
vglmFit <- VGAM::vglm(formula = cbind(e0, e1, e2) ~ timeInt + age + logwage, data = trainSet_long,
family = VGAM::multinomial(refLevel = "e0"))
# Calculate predicted hazards
predHazards <- VGAM::predictvglm(vglmFit, newdata = valSet_long, type = "response")
# Calibration plots
calibrationPlot(predHazards, testDataLong = valSet_long)
calibrationPlot(predHazards, testDataLong = valSet_long, event = "e2")
###############################
# Subdistribution hazards model
# Convert to long format
trainSet_long <- dataLongSubDist(dataShort = trainSet, timeColumn = "spell",
eventColumns = c("censor1", "censor4"), eventFocus = "censor1")
valSet_long <- dataLongSubDist(dataShort = valSet, timeColumn = "spell",
eventColumns = c("censor1", "censor4"), eventFocus = "censor1")
# Estimate continuation ratio model with logit link
glmFit <- glm(formula = y ~ timeInt + age + logwage, data = trainSet_long,
family = binomial(), weights = trainSet_long$subDistWeights)
# Calculate predicted hazards
predHazards <- predict(glmFit, newdata = valSet_long, type = "response")
# Calibration plot
calibrationPlot(predHazards, testDataLong = valSet_long, weights = valSet_long$subDistWeights)