| ppplot {stats} | R Documentation |
Probability-probability Plots
Description
ppplot produces a probability-probability (P-P) plot of two numerical
variables. If conf.level is given, an estimate and corresponding
confidence band for the P-P curve under a
distribution-free semiparametric model is plotted.
Usage
ppplot(x, y, plot.it = TRUE,
xlab = paste("Cumulative probabilities for", deparse1(substitute(x))),
ylab = paste("Cumulative probabilities for", deparse1(substitute(y))),
main = "P-P plot", ..., conf.level = NULL,
conf.args = list(link = "logit", type = "Wald", col = NA, border = NULL))
Arguments
x |
the first sample for |
y |
the second sample for |
plot.it |
logical. Should the result be plotted? |
xlab, ylab |
the |
main |
a main title for the plot. |
... |
graphical parameters. |
conf.level |
confidence level of the band. The default, |
conf.args |
list of arguments defining confidence band computation and
visualisation: |
Details
For independent two samples, denoted x and y,
the function produces a probability-probability plot (Wilk and Gnanadesikan 1968) of pairs
(\hat{F}_{x}(z), \hat{F}_{y}(z)) for observed data z = (x, y).
If the data generating process follows a model where the two distribution
functions, after appropriate transformation, are horizontally shifted
versions of each other, the probability-probability curve is a simple function of this shift and
confidence bands can be obtained from a confidence interval for this shift
parameter, see free1way for the model and
Sewak and Hothorn (2023) for the connection to ROC curves.
Substantial deviations of the empirical (step function) from the theoretical (smooth) curve indicates lack of fit of the semiparametric model.
Value
An object of class stepfun.
References
Sewak A, Hothorn T (2023). “Estimating Transformations for Evaluating Diagnostic Tests with Covariate Adjustment.” Statistical Methods in Medical Research, 32(7), 1403–1419. doi:10.1177/09622802231176030.
Wilk MB, Gnanadesikan R (1968). “Probability Plotting Methods for the Analysis of Data.” Biometrika, 55(1), 1–17. doi:10.1093/biomet/55.1.1.
Examples
## make example reproducible
set.seed(29)
## well-fitting logistic model
nd <- data.frame(groups = gl(2, 50, labels = paste0("G", 1:2)))
nd$y <- rlogis(nrow(nd), location = c(0, 2)[nd$groups])
with(with(nd, split(y, groups)),
ppplot(G1, G2, conf.level = .95,
conf.args = list(link = "logit", type = "Wald", col = 2)))
# with appropriate Wilcoxon test and log-odds ratio
coef(ft <- free1way(y ~ groups, data = nd))
# the model-based probability-probability curve
prb <- 1:99 / 100
points(prb, plogis(qlogis(prb) - coef(ft)), pch = 3)
## the corresponding model-based receiver operating characteristic (ROC)
## curve, see Sewak and Hothorn (2023)
plot(prb, plogis(qlogis(1 - prb) - coef(ft), lower.tail = FALSE),
xlab = "1 - Specificity", ylab = "Sensitivity", type = "l",
main = "ROC Curve")
abline(a = 0, b = 1, col = "lightgrey")
# with confidence band
lines(prb, plogis(qlogis(1 - prb) - confint(ft, test = "Rao")[1],
lower.tail = FALSE), lty = 3)
lines(prb, plogis(qlogis(1 - prb) - confint(ft, test = "Rao")[2],
lower.tail = FALSE), lty = 3)
# and corresponding area under the ROC curve (AUC)
# with score confidence interval
coef(ft, what = "AUC")
confint(ft, test = "Rao", what = "AUC")
## ill-fitting normal model
nd$y <- rnorm(nrow(nd), mean = c(0, .5)[nd$groups], sd = c(1, 1.5)[nd$groups])
with(with(nd, split(y, groups)),
ppplot(G1, G2, conf.level = .95,
conf.args = list(link = "probit", type = "Wald", col = 2)))
# inappropriate probit model
coef(free1way(y ~ groups, data = nd, link = "probit"))