bin_cols {tidybins} | R Documentation |
Bin Cols
Description
Make bins in a tidy fashion. Adds a column to your data frame containing the integer codes of the specified bins of a certain column. Specifying multiple columns is only intended for supervised binning, so mutliple columns can be simultaneously binned optimally with respect to a target variable.
Usage
bin_cols(
.data,
col,
n_bins = 10,
bin_type = "frequency",
...,
target = NULL,
pretty_labels = FALSE,
seed = 1,
method = "mdlp"
)
Arguments
.data |
a data frame |
col |
a column, vector of columns, or tidyselect |
n_bins |
number of bins |
bin_type |
method to make bins |
... |
params to be passed to selected binning method |
target |
unquoted column for supervised binning |
pretty_labels |
logical. If T returns interval label rather than integer rank |
seed |
seed for stochastic binning (xgboost) |
method |
method for bin mdlp |
Details
Description of the arguments for bin_type
- frequency (fr)
creates bins of equal content via quantiles. Wraps
bin
with method "content". Similar tontile
- width (wi)
create bins of equal numeric width. Wraps
bin
with method "length"- kmeans (km)
create bins using 1-dimensional kmeans. Wraps
bin
with method "clusters"- value (va)
each bin has equal sum of values
- xgboost (xg)
column is binned by best predictor of a target column using
step_discretize_xgb
- cart (ca)
if the col does not have enough distinct values, xgboost will fail and automatically revert to
step_discretize_cart
- woe (wo)
column is binned by weight of evidence. Requires binary target
- logreg (lr)
column is binned by logistic regression. Requires binary target.
- mdlp
uses the
discretizeDF.supervised
algorithm with a variety of methods.
Value
a data frame
Examples
iris %>%
bin_cols(Sepal.Width, n_bins = 5, pretty_labels = TRUE) %>%
bin_cols(Petal.Width, n_bins = 3, bin_type = c("width", "kmeans")) %>%
bin_cols(Sepal.Width, bin_type = "xgboost", target = Species, seed = 1) -> iris1
#binned columns are named by original name + method abbreviation + number bins created.
#Sometimes the actual number of bins is less than n_bins if the col lacks enough variance.
iris1 %>%
print(width = Inf)
iris1 %>%
bin_summary() %>%
print(width = Inf)