rocit {ROCit} | R Documentation |
ROC Analysis of Binary Classifier
Description
rocit
is the main function of ROCit package.
With the diagnostic score and the class of each observation,
it calculates true positive rate (sensitivity) and
false positive rate (1-Specificity) at convenient cutoff values
to construct ROC curve. The function returns "rocit"
object,
which can be passed as arguments for other S3 methods.
Usage
rocit(score, class, negref = NULL, method = "empirical", step = FALSE)
Arguments
score |
An numeric array of diagnostic score. |
class |
An array of equal length of score, containing the class of the observations. |
negref |
The reference value, same as the
|
method |
The method of estimating ROC curve. Currently supports
|
step |
Logical, default in |
Details
ROC curve is defined as the set of ordered pairs,
(FPR(c), TPR(c))
, where, -\infty < c < \infty
,
where, FPR(c) = P(D \ge c | Y = 0)
and FPR(c) = P(D \ge c | Y = 1)
at cutoff c
.
Alternately, it can be defined as:
y(x) = 1 - G[F^{-1}(1-x)], 0 \le x \le 1
where F
and G
are the cumulative density functions of the
diagnostic score in negative and positive responses respectively.
rocit
evaluates TPR and FPR values at convenient cutoffs.
As the name implies, empirical TPR and FPR values are evaluated
for method = "empirical"
. For "binormal"
, the distribution
of diagnostic are assumed to be normal and maximum likelihood parameters
are estimated. If method = "nonparametric"
, then kernel density
estimates (using density
) are applied with
following bandwidth:
-
h_Y = 0.9 * min(\sigma_Y, IQR(D_Y)/1.34)/((n_Y)^{(1/5)})
-
h_{\bar{Y}} = 0.9 * min(\sigma_{\bar{Y}}, IQR(D_{\bar{Y}})/1.34)/((n_{\bar{Y}})^{(1/5)})
as described in Zou et al. From the kernel estimates of PDFs, CDFs are estimated using trapezoidal rule.
For "empirical"
ROC, the algorithm firt rank orders the
data and calculates TPR and FPR by treating all predicted
up to certain level as positive. If step
is TRUE
,
then the ROC curve is generated based on all the calculated
{FPR, TPR} pairs regardless of tie in the data. If step
is
FALSE
, then the ROC curve follows a diagonal path for the ties.
For "empirical"
ROC, trapezoidal rule is
applied to estimate area under curve (AUC). For "binormal"
, AUC is estimated by
\Phi(A/\sqrt(1 + B^2)
, where A
and B
are functions
of mean and variance of the diagnostic in two groups.
For "nonparametric"
, AUC is estimated as
by
\frac{1}{n_Yn_{\bar{Y}}}
\sum_{i=1}^{n_{\bar{Y}}}
\sum_{j=1}^{n_{Y}}
\Phi(
\frac{D_{Y_j}-D_{{\bar{Y}}_i}}{\sqrt{h_Y^2+h_{\bar{Y}}^2}}
)
Value
A list of class "rocit"
, having following elements:
method |
The method applied to estimate ROC curve. |
pos_count |
Number of positive responses. |
neg_count |
Number of negative responses. |
pos_D |
Array of diagnostic scores in positive responses. |
neg_D |
Array of diagnostic scores in negative responses. |
AUC |
Area under curve. See "Details". |
Cutoff |
Array of cutoff values at which the
true positive rates and false positive rates
are evaluated. Applicable for |
param |
Maximum likelihood estimates of |
TPR |
Array of true positive rates (or sensitivities or recalls), evaluated at the cutoff values. |
FPR |
Array of false positive rates (or 1-specificity), evaluated at the cutoff values. |
Note
The algorithm is designed for complete cases. If NA(s) found in
either score
or class
, then removed.
References
Pepe, Margaret Sullivan. The statistical evaluation of medical tests for classification and prediction. Medicine, 2003.
Zou, Kelly H., W. J. Hall, and David E. Shapiro. "Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests." Statistics in medicine 16, no. 19 (1997): 2143-2156.
See Also
ciROC
, ciAUC
, plot.rocit
,
gainstable
, ksplot
Examples
# ---------------------
data("Diabetes")
roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest,
negref = "-") # default method empirical
roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest,
negref = "-", method = "bin")
# ---------------------
summary(roc_empirical)
summary(roc_binormal)
# ---------------------
plot(roc_empirical)
plot(roc_binormal, col = c("#00BA37", "#F8766D"),
legend = FALSE, YIndex = FALSE)