bin_by_interval {regressinator}R Documentation

Group a data frame into bins

Description

Groups a data frame (similarly to dplyr::group_by()) based on the values of a column, either by dividing up the range into equal pieces or by quantiles.

Usage

bin_by_interval(.data, col, breaks = NULL)

bin_by_quantile(.data, col, breaks = NULL)

Arguments

.data

Data frame to bin

col

Column to bin by

breaks

Number of bins to create. bin_by_interval() also accepts a numeric vector of two or more unique cut points to use. If NULL, a default number of breaks is chosen based on the number of rows in the data. In bin_by_quantile(), if the number of unique values of the column is smaller than breaks, fewer bins will be produced.

Details

bin_by_interval() breaks the numerical range of that column into equal-sized intervals, or into intervals specified by breaks. bin_by_quantile() splits the range into pieces based on quantiles of the data, so each interval contains roughly an equal number of observations.

Value

Grouped data frame, similar to those returned by dplyr::group_by(). An additional column .bin indicates the bin number for each group. Use dplyr::summarize() to calculate values within each group, or other dplyr operations that work on groups.

Examples

suppressMessages(library(dplyr))
cars |>
  bin_by_interval(speed, breaks = 5) |>
  summarize(mean_speed = mean(speed),
            mean_dist = mean(dist))

cars |>
  bin_by_quantile(speed, breaks = 5) |>
  summarize(mean_speed = mean(speed),
            mean_dist = mean(dist))

[Package regressinator version 0.1.3 Index]