binning_by {dlookr} | R Documentation |
Optimal Binning for Scoring Modeling
Description
The binning_by() finding intervals for numerical variable using optical binning. Optimal binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.
Usage
binning_by(.data, y, x, p = 0.05, ordered = TRUE, labels = NULL)
Arguments
.data |
a data frame. |
y |
character. name of binary response variable(0, 1). The variable must contain only the integers 0 and 1 as element. However, in the case of factor having two levels, it is performed while type conversion is performed in the calculation process. |
x |
character. name of continuous characteristic variable. At least 5 different values. and Inf is not allowed. |
p |
numeric. percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%). |
ordered |
logical. whether to build an ordered factor or not. |
labels |
character. the label names to use for each of the bins. |
Details
This function is useful when used with the mutate/transmute function of the dplyr package. And this function is implemented using smbinning() function of smbinning package.
Value
an object of "optimal_bins" class. Attributes of "optimal_bins" class is as follows.
class : "optimal_bins".
type : binning type, "optimal".
breaks : numeric. the number of intervals into which x is to be cut.
levels : character. levels of binned value.
raw : numeric. raw data, x argument value.
ivtable : data.frame. information value table.
iv : numeric. information value.
target : integer. binary response variable.
attributes of "optimal_bins" class
Attributes of the "optimal_bins" class that is as follows.
class : "optimal_bins".
levels : character. factor or ordered factor levels
type : character. binning method
breaks : numeric. breaks for binning
raw : numeric. before the binned the raw data
ivtable : data.frame. information value table
iv : numeric. information value
target : integer. binary response variable
See vignette("transformation") for an introduction to these concepts.
See Also
binning
, summary.optimal_bins
, plot.optimal_bins
.
Examples
library(dplyr)
# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "creatinine"] <- NA
# optimal binning using character
bin <- binning_by(heartfailure2, "death_event", "creatinine")
# optimal binning using name
bin <- binning_by(heartfailure2, death_event, creatinine)
bin