cutpointr {cutpointr}  R Documentation 
Determine and evaluate optimal cutpoints
Description
Using predictions (or e.g. biological marker values) and binary class labels, this function
will determine "optimal" cutpoints using various selectable methods. The
methods for cutpoint determination can be evaluated using bootstrapping. An
estimate of the cutpoint variability and the outofsample performance can then
be returned with summary
or plot
. For an introduction to the
package please see vignette("cutpointr", package = "cutpointr")
.
Usage
cutpointr(...)
## Default S3 method:
cutpointr(
data,
x,
class,
subgroup = NULL,
method = maximize_metric,
metric = sum_sens_spec,
pos_class = NULL,
neg_class = NULL,
direction = NULL,
boot_runs = 0,
boot_stratify = FALSE,
use_midpoints = FALSE,
break_ties = median,
na.rm = FALSE,
allowParallel = FALSE,
silent = FALSE,
tol_metric = 1e06,
...
)
## S3 method for class 'numeric'
cutpointr(
x,
class,
subgroup = NULL,
method = maximize_metric,
metric = sum_sens_spec,
pos_class = NULL,
neg_class = NULL,
direction = NULL,
boot_runs = 0,
boot_stratify = FALSE,
use_midpoints = FALSE,
break_ties = median,
na.rm = FALSE,
allowParallel = FALSE,
silent = FALSE,
tol_metric = 1e06,
...
)
Arguments
... 
Further optional arguments that will be passed to method. minimize_metric and maximize_metric pass ... to metric. 
data 
A data.frame with the data needed for x, class and optionally subgroup. 
x 
The variable name to be used for classification, e.g. predictions. The raw vector of values if the data argument is unused. 
class 
The variable name indicating class membership. If the data argument is unused, the vector of raw numeric values. 
subgroup 
An additional covariate that identifies subgroups or the raw data if data = NULL. Separate optimal cutpoints will be determined per group. Numeric, character and factor are allowed. 
method 
(function) A function for determining cutpoints. Can be user supplied or use some of the built in methods. See details. 
metric 
(function) The function for computing a metric when using maximize_metric or minimize_metric as method and and for the outofbag values during bootstrapping. A way of internally validating the performance. User defined functions can be supplied, see details. 
pos_class 
(optional) The value of class that indicates the positive class. 
neg_class 
(optional) The value of class that indicates the negative class. 
direction 
(character, optional) Use ">=" or "<=" to indicate whether x is supposed to be larger or smaller for the positive class. 
boot_runs 
(numerical) If positive, this number of bootstrap samples will be used to assess the variability and the outofsample performance. 
boot_stratify 
(logical) If the bootstrap is stratified, bootstrap samples are drawn separately in both classes and then combined, keeping the proportion of positives and negatives constant in every resample. 
use_midpoints 
(logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">=") or the next lowest observation (for direction = "<=") which avoids biasing the optimal cutpoint. 
break_ties 
If multiple cutpoints are found, they can be summarized using this function, e.g. mean or median. To return all cutpoints use c as the function. 
na.rm 
(logical) Set to TRUE (default FALSE) to keep only complete cases of x, class and subgroup (if specified). Missing values with na.rm = FALSE will raise an error. 
allowParallel 
(logical) If TRUE, the bootstrapping will be parallelized using foreach. A local cluster, for example, should be started manually beforehand. 
silent 
(logical) If TRUE suppresses all messages. 
tol_metric 
All cutpoints will be returned that lead to a metric
value in the interval [m_max  tol_metric, m_max + tol_metric] where
m_max is the maximum achievable metric value. This can be used to return
multiple decent cutpoints and to avoid floatingpoint problems. Not supported
by all 
Details
If direction
and/or pos_class
and neg_class
are not given, the function will
assume that higher values indicate the positive class and use the class
with a higher median as the positive class.
This function uses tidyeval to support unquoted arguments. For programming
with cutpointr
the operator !!
can be used to unquote an argument, see the
examples.
Different methods can be selected for determining the optimal cutpoint via the method argument. The package includes the following method functions:

maximize_metric
: Maximize the metric function 
minimize_metric
: Minimize the metric function 
maximize_loess_metric
: Maximize the metric function after LOESS smoothing 
minimize_loess_metric
: Minimize the metric function after LOESS smoothing 
maximize_spline_metric
: Maximize the metric function after spline smoothing 
minimize_spline_metric
: Minimize the metric function after spline smoothing 
maximize_boot_metric
: Maximize the metric function as a summary of the optimal cutpoints in bootstrapped samples 
minimize_boot_metric
: Minimize the metric function as a summary of the optimal cutpoints in bootstrapped samples 
oc_youden_kernel
: Maximize the YoudenIndex after kernel smoothing the distributions of the two classes 
oc_youden_normal
: Maximize the YoudenIndex parametrically assuming normally distributed data in both classes 
oc_manual
: Specify the cutpoint manually
Userdefined functions can be supplied to method, too. As a reference, the code of all included method functions can be accessed by simply typing their name. To define a new method function, create a function that may take as input(s):

data
: Adata.frame
ortbl_df

x
: (character) The name of the predictor or independent variable 
class
: (character) The name of the class or dependent variable 
metric_func
: A function for calculating a metric, e.g. accuracy 
pos_class
: The positive class 
neg_class
: The negative class 
direction
: ">=" if the positive class has higher x values, "<=" otherwise 
tol_metric
: (numeric) In the builtin methods a tolerance around the optimal metric value 
use_midpoints
: (logical) In the builtin methods whether to use midpoints instead of exact optimal cutpoints 
...
Further arguments
The ...
argument can be used to avoid an error if not all of the above
arguments are needed or in order to pass additional arguments to method.
The function should return a data.frame
or tbl_df
with
one row, the column "optimal_cutpoint", and an optional column with an arbitrary name
with the metric value at the optimal cutpoint.
Builtin metric functions include:

accuracy
: Fraction correctly classified 
youden
: Youden or JIndex = sensitivity + specificity  1 
sum_sens_spec
: sensitivity + specificity 
sum_ppv_npv
: The sum of positive predictive value (PPV) and negative predictive value (NPV) 
prod_sens_spec
: sensitivity * specificity 
prod_ppv_npv
: The product of positive predictive value (PPV) and negative predictive value (NPV) 
cohens_kappa
: Cohen's Kappa 
abs_d_sens_spec
: The absolute difference between sensitivity and specificity 
roc01
: Distance to the point (0,1) on ROC space 
abs_d_ppv_npv
: The absolute difference between positive predictive value (PPV) and negative predictive value (NPV) 
p_chisquared
: The pvalue of a chisquared test on the confusion matrix of predictions and observations 
odds_ratio
: The odds ratio calculated as (TP / FP) / (FN / TN) 
risk_ratio
: The risk ratio (relative risk) calculated as (TP / (TP + FN)) / (FP / (FP + TN)) positive and negative likelihood ratio calculated as
plr
= true positive rate / false positive rate andnlr
= false negative rate / true negative rate
misclassification_cost
: The sum of the misclassification cost of false positives and false negatives fp * cost_fp + fn * cost_fn. Additional arguments to cutpointr:cost_fp
,cost_fn

total_utility
: The total utility of true / false positives / negatives calculated as utility_tp * TP + utility_tn * TN  cost_fp * FP  cost_fn * FN. Additional arguments to cutpointr:utility_tp
,utility_tn
,cost_fp
,cost_fn

F1_score
: The F1score (2 * TP) / (2 * TP + FP + FN) 
sens_constrain
: Maximize sensitivity given a minimal value of specificity 
spec_constrain
: Maximize specificity given a minimal value of sensitivity 
metric_constrain
: Maximize a selected metric given a minimal value of another selected metric
Furthermore, the following functions are included which can be used as metric
functions but are more useful for plotting purposes, for example in
plot_cutpointr, or for defining new metric functions:
tp
, fp
, tn
, fn
, tpr
, fpr
,
tnr
, fnr
, false_omission_rate
,
false_discovery_rate
, ppv
, npv
, precision
,
recall
, sensitivity
, and specificity
.
User defined metric functions can be created as well which can accept the following inputs as vectors:

tp
: Vector of true positives 
fp
: Vector of false positives 
tn
: Vector of true negatives 
fn
: Vector of false negatives 
...
If the metric function is used in conjunction with any of the maximize / minimize methods, further arguments can be passed
The function should return a numeric vector or a matrix or a data.frame
with one column. If the column is named,
the name will be included in the output and plots. Avoid using names that
are identical to the column names that are by default returned by cutpointr.
If boot_runs
is positive, that number of bootstrap samples will be drawn
and the optimal cutpoint using method
will be determined. Additionally,
as a way of internal validation, the function in metric
will be used to
score the outofbag predictions using the cutpoints determined by
method
. Various default metrics are always included in the bootstrap results.
If multiple optimal cutpoints are found, the column optimal_cutpoint becomes a list that contains the vector(s) of the optimal cutpoints.
If use_midpoints = TRUE
the mean of the optimal cutpoint and the next
highest or lowest possible cutpoint is returned, depending on direction
.
The tol_metric
argument can be used to avoid floatingpoint problems
that may lead to exclusion of cutpoints that achieve the optimally achievable
metric value. Additionally, by selecting a large tolerance multiple cutpoints
can be returned that lead to decent metric values in the vicinity of the
optimal metric value. tol_metric
is passed to metric and is only
supported by the maximization and minimization functions, i.e.
maximize_metric
, minimize_metric
, maximize_loess_metric
,
minimize_loess_metric
, maximize_spline_metric
, and
minimize_spline_metric
. In maximize_boot_metric
and
minimize_boot_metric
multiple optimal cutpoints will be passed to the
summary_func
of these two functions.
Value
A cutpointr object which is also a data.frame and tbl_df.
See Also
Other main cutpointr functions:
add_metric()
,
boot_ci()
,
boot_test()
,
multi_cutpointr()
,
predict.cutpointr()
,
roc()
Examples
library(cutpointr)
## Optimal cutpoint for dsi
data(suicide)
opt_cut < cutpointr(suicide, dsi, suicide)
opt_cut
s_opt_cut < summary(opt_cut)
plot(opt_cut)
## Not run:
## Predict class for new observations
predict(opt_cut, newdata = data.frame(dsi = 0:5))
## Supplying raw data, same result
cutpointr(x = suicide$dsi, class = suicide$suicide)
## direction, class labels, method and metric can be defined manually
## Again, same result
cutpointr(suicide, dsi, suicide, direction = ">=", pos_class = "yes",
method = maximize_metric, metric = youden)
## Optimal cutpoint for dsi, as before, but for the separate subgroups
opt_cut < cutpointr(suicide, dsi, suicide, gender)
opt_cut
(s_opt_cut < summary(opt_cut))
tibble:::print.tbl(s_opt_cut)
## Bootstrapping also works on individual subgroups
set.seed(30)
opt_cut < cutpointr(suicide, dsi, suicide, gender, boot_runs = 1000,
boot_stratify = TRUE)
opt_cut
summary(opt_cut)
plot(opt_cut)
## Parallelized bootstrapping
library(doParallel)
library(doRNG)
cl < makeCluster(2) # 2 cores
registerDoParallel(cl)
registerDoRNG(12) # Reproducible parallel loops using doRNG
opt_cut < cutpointr(suicide, dsi, suicide, gender,
boot_runs = 1000, allowParallel = TRUE)
stopCluster(cl)
opt_cut
plot(opt_cut)
## Robust cutpoint method using kernel smoothing for optimizing YoudenIndex
opt_cut < cutpointr(suicide, dsi, suicide, gender,
method = oc_youden_kernel)
opt_cut
## End(Not run)