calibrate {rms} | R Documentation |
Resampling Model Calibration
Description
Uses bootstrapping or cross-validation to get bias-corrected (overfitting-
corrected) estimates of predicted vs. observed values based on
subsetting predictions into intervals (for survival models) or on
nonparametric smoothers (for other models). There are calibration
functions for Cox (cph
), parametric survival models (psm
),
binary and ordinal logistic models (lrm
) and ordinary least
squares (ols
). For survival models,
"predicted" means predicted survival probability at a single
time point, and "observed" refers to the corresponding Kaplan-Meier
survival estimate, stratifying on intervals of predicted survival, or,
if the polspline
package is installed, the predicted survival
probability as a function of transformed predicted survival probability
using the flexible hazard regression approach (see the val.surv
function for details). For logistic and linear models, a nonparametric
calibration curve is estimated over a sequence of predicted values. The
fit must have specified x=TRUE, y=TRUE
. The print
and
plot
methods for lrm
and ols
models (which use
calibrate.default
) print the mean absolute error in predictions,
the mean squared error, and the 0.9 quantile of the absolute error.
Here, error refers to the difference between the predicted values and
the corresponding bias-corrected calibrated values.
Below, the second, third, and fourth invocations of calibrate
are, respectively, for ols
and lrm
, cph
, and
psm
. The first and second plot
invocation are
respectively for lrm
and ols
fits or all other fits.
Usage
calibrate(fit, ...)
## Default S3 method:
calibrate(fit, predy,
method=c("boot","crossvalidation",".632","randomization"),
B=40, bw=FALSE, rule=c("aic","p"),
type=c("residual","individual"),
sls=.05, aics=0, force=NULL, estimates=TRUE, pr=FALSE, kint,
smoother="lowess", digits=NULL, ...)
## S3 method for class 'cph'
calibrate(fit, cmethod=c('hare', 'KM'),
method="boot", u, m=150, pred, cuts, B=40,
bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, force=NULL,
estimates=TRUE,
pr=FALSE, what="observed-predicted", tol=1e-12, maxdim=5, ...)
## S3 method for class 'psm'
calibrate(fit, cmethod=c('hare', 'KM'),
method="boot", u, m=150, pred, cuts, B=40,
bw=FALSE,rule="aic",
type="residual", sls=.05, aics=0, force=NULL, estimates=TRUE,
pr=FALSE, what="observed-predicted", tol=1e-12, maxiter=15,
rel.tolerance=1e-5, maxdim=5, ...)
## S3 method for class 'calibrate'
print(x, B=Inf, ...)
## S3 method for class 'calibrate.default'
print(x, B=Inf, ...)
## S3 method for class 'calibrate'
plot(x, xlab, ylab, subtitles=TRUE, conf.int=TRUE,
cex.subtitles=.75, riskdist=TRUE, add=FALSE,
scat1d.opts=list(nhistSpike=200), par.corrected=NULL, ...)
## S3 method for class 'calibrate.default'
plot(x, xlab, ylab, xlim, ylim,
legend=TRUE, subtitles=TRUE, cex.subtitles=.75, riskdist=TRUE,
scat1d.opts=list(nhistSpike=200), ...)
Arguments
fit |
a fit from |
x |
an object created by |
method , B , bw , rule , type , sls , aics , force , estimates |
see |
cmethod |
method for validating survival predictions using
right-censored data. The default is |
u |
the time point for which to validate predictions for survival
models. For |
m |
group predicted |
pred |
vector of predicted survival probabilities at which to evaluate the
calibration curve. By default, the low and high prediction values
from |
cuts |
actual cut points for predicted survival probabilities. You may
specify only one of |
pr |
set to |
what |
The default is |
tol |
criterion for matrix singularity (default is |
maxdim |
see |
maxiter |
for |
rel.tolerance |
parameter passed to
|
predy |
a scalar or vector of predicted values to calibrate (for |
kint |
For an ordinal logistic model the default predicted
probability that |
smoother |
a function in two variables which produces |
digits |
If specified, predicted values are rounded to
|
... |
other arguments to pass to |
xlab |
defaults to "Predicted x-units Survival" or to a suitable label for other models |
ylab |
defaults to "Fraction Surviving x-units" or to a suitable label for other models |
xlim , ylim |
2-vectors specifying x- and y-axis limits, if not using defaults |
subtitles |
set to |
conf.int |
set to |
cex.subtitles |
character size for plotting subtitles |
riskdist |
set to |
add |
set to |
scat1d.opts |
a list specifying options to send to |
par.corrected |
a list specifying graphics parameters |
legend |
set to |
Details
If the fit was created using penalized maximum likelihood estimation,
the same penalty
and penalty.scale
parameters are used during
validation.
Value
matrix specifying mean predicted survival in each interval, the
corresponding estimated bias-corrected Kaplan-Meier estimates,
number of subjects, and other statistics. For linear and logistic models,
the matrix instead has rows corresponding to the prediction points, and
the vector of predicted values being validated is returned as an attribute.
The returned object has class "calibrate"
or
"calibrate.default"
.
plot.calibrate.default
invisibly returns the vector of estimated
prediction errors corresponding to the dataset used to fit the model.
Side Effects
prints, and stores an object pred.obs
or .orig.cal
Author(s)
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
See Also
validate
, predab.resample
,
groupkm
, errbar
,
scat1d
, cph
, psm
,
lowess
,fit.mult.impute
,
processMI
Examples
require(survival)
set.seed(1)
n <- 200
d.time <- rexp(n)
x1 <- runif(n)
x2 <- factor(sample(c('a', 'b', 'c'), n, TRUE))
f <- cph(Surv(d.time) ~ pol(x1,2) * x2, x=TRUE, y=TRUE, surv=TRUE, time.inc=1.5)
#or f <- psm(S ~ \dots)
pa <- requireNamespace('polspline')
if(pa) {
cal <- calibrate(f, u=1.5, B=20) # cmethod='hare'
plot(cal)
}
cal <- calibrate(f, u=1.5, cmethod='KM', m=50, B=20) # usually B=200 or 300
plot(cal, add=pa)
set.seed(1)
y <- sample(0:2, n, TRUE)
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
x4 <- runif(n)
f <- lrm(y ~ x1 + x2 + x3 * x4, x=TRUE, y=TRUE)
cal <- calibrate(f, kint=2, predy=seq(.2, .8, length=60),
group=y)
# group= does k-sample validation: make resamples have same
# numbers of subjects in each level of y as original sample
plot(cal)
#See the example for the validate function for a method of validating
#continuation ratio ordinal logistic models. You can do the same
#thing for calibrate