R: Optimal Binning for Scoring Modeling

binning_by {dlookr}

R Documentation

Optimal Binning for Scoring Modeling

Description

The binning_by() finding intervals for numerical variable using optical binning. Optimal binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.

Usage

binning_by(.data, y, x, p = 0.05, ordered = TRUE, labels = NULL)

Arguments

`.data`	a data frame.
`y`	character. name of binary response variable(0, 1). The variable must contain only the integers 0 and 1 as element. However, in the case of factor having two levels, it is performed while type conversion is performed in the calculation process.
`x`	character. name of continuous characteristic variable. At least 5 different values. and Inf is not allowed.
`p`	numeric. percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%).
`ordered`	logical. whether to build an ordered factor or not.
`labels`	character. the label names to use for each of the bins.

Details

This function is useful when used with the mutate/transmute function of the dplyr package. And this function is implemented using smbinning() function of smbinning package.

Value

an object of "optimal_bins" class. Attributes of "optimal_bins" class is as follows.

class : "optimal_bins".
type : binning type, "optimal".
breaks : numeric. the number of intervals into which x is to be cut.
levels : character. levels of binned value.
raw : numeric. raw data, x argument value.
ivtable : data.frame. information value table.
iv : numeric. information value.
target : integer. binary response variable.

attributes of "optimal_bins" class

Attributes of the "optimal_bins" class that is as follows.

class : "optimal_bins".
levels : character. factor or ordered factor levels
type : character. binning method
breaks : numeric. breaks for binning
raw : numeric. before the binned the raw data
ivtable : data.frame. information value table
iv : numeric. information value
target : integer. binary response variable

See vignette("transformation") for an introduction to these concepts.

Examples

library(dplyr)

# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "creatinine"] <- NA

# optimal binning using character
bin <- binning_by(heartfailure2, "death_event", "creatinine")

# optimal binning using name
bin <- binning_by(heartfailure2, death_event, creatinine)
bin

[Package dlookr version 0.6.3 Index]