sts.bin {monobin} | R Documentation |
Four-stage monotonic binning procedure with statistical test correction
Description
sts.bin
implements extension of the three-stage monotonic binning procedure (iso.bin
)
with final step of iterative merging of adjacent bins based on
statistical test.
Usage
sts.bin(
x,
y,
sc = c(NA, NaN, Inf, -Inf),
sc.method = "together",
y.type = NA,
min.pct.obs = 0.05,
min.avg.rate = 0.01,
p.val = 0.05,
force.trend = NA
)
Arguments
x |
Numeric vector to be binned. |
y |
Numeric target vector (binary or continuous). |
sc |
Numeric vector with special case elements. Default values are |
sc.method |
Define how special cases will be treated, all together or in separate bins.
Possible values are |
y.type |
Type of |
min.pct.obs |
Minimum percentage of observations per bin. Default is 0.05 or minimum 30 observations. |
min.avg.rate |
Minimum |
p.val |
Threshold for p-value of statistical test. Default is 0.05. For binary target test of two proportion is applied, while for continuous two samples independent t-test. |
force.trend |
If the expected trend should be forced. Possible values: |
Value
The command sts.bin
generates a list of two objects. The first object, data frame summary.tbl
presents a summary table of final binning, while x.trans
is a vector of discretized values.
In case of single unique value for x
or y
of complete cases (cases different than special cases),
it will return data frame with info.
See Also
iso.bin
for three-stage monotonic binning procedure.
Examples
suppressMessages(library(monobin))
data(gcd)
#binary target
maturity.bin <- sts.bin(x = gcd$maturity, y = gcd$qual)
maturity.bin[[1]]
tapply(gcd$qual, maturity.bin[[2]], function(x) c(length(x), sum(x), mean(x)))
prop.test(x = c(sum(gcd$qual[maturity.bin[[2]]%in%"01 (-Inf,8)"]),
sum(gcd$qual[maturity.bin[[2]]%in%"02 [8,16)"])),
n = c(length(gcd$qual[maturity.bin[[2]]%in%"01 (-Inf,8)"]),
length(gcd$qual[maturity.bin[[2]]%in%"02 [8,16)"])),
alternative = "less",
correct = FALSE)$p.value
#continuous target
age.bin <- sts.bin(x = gcd$age, y = gcd$qual, y.type = "cont")
age.bin[[1]]
t.test(x = gcd$qual[age.bin[[2]]%in%"01 (-Inf,26)"],
y = gcd$qual[age.bin[[2]]%in%"02 [26,35)"],
alternative = "greater")$p.value