bins {binr}R Documentation

Cut Numeric Values Into Evenly Distributed Groups (Bins)

Description

bins - Cuts points in vector x into evenly distributed groups (bins). bins takes 3 separate approaches to generating the cuts, picks the one resulting in the least mean square deviation from the ideal cut - length(x) / target.bins points in each bin - and then merges small bins unless excat.groups is TRUE The 3 approaches are:

  1. Use quantiles, and increase the number of even cuts up to max.breaks until the number of groups reaches the desired number. See bins.quantiles.

  2. Start with a single bin with all the data in it and perform bin splits until either the desired number of bins is reached or there's no reduction in error (the latter is ignored if exact.groups is TRUE). See bins.split.

  3. Start with length(table(x)) bins, each containing exactly one distinct value and merge bins until the desired number of bins is reached. If exact.groups is FALSE, continue merging until there's no further reduction in error. See bins.merge.

For each of these approaches, apply redistribution of points among existing bins until there's no further decrease in error. See bins.move.

bins.getvals - Extracts cut points from the object retured by bins. The cut points are always between the values in x and weighed such that the cut point splits the area under the line from (lo, n1) to (hi, n2) in half.

bins.merr - Partitioning the data into bins using splitting, merging and moving optimizes this error function, which is the mean squared error of point counts in the bins relative to the optimal number of points per bin.

Usage

bins(x, target.bins, max.breaks = NA, exact.groups = F, verbose = F,
  errthresh = 0.1, minpts = NA)

bins.getvals(lst, minpt = -Inf, maxpt = Inf)

bins.merr(binct, target.bins)

Arguments

x

Vector of numbers

target.bins

Number of groups desired; this is also the max number of groups.

max.breaks

Used for initial cut. If exact.groups is FALSE, bins are merged until there's no bins with fewer than length(x) / max.breaks points. In bins, one of max.breaks and minpts must be supplied.

exact.groups

if TRUE, the result will have exactly the number of target.bins; if FALSE, the result may contain fewer than target.bins bins

verbose

Indicates verbose output.

errthresh

If the error is below the provided value, stops after the first rough estimate of the bins.

minpts

Minimum number of points in a bin. In bins, one of max.breaks and minpts must be supplied.

lst

The list returned by the bins function.

minpt

The value replacing the lower bound of the cut points.

maxpt

The value replacing the upper bound of the cut points.

binct

The number of points falling into the bins.

Details

The gains are computed using incremental analytical expresions derived for moving a value from one bin to the next, splitting a bin into two or merging two bins.

Value

A list containing the following items (not all of them may be present):

bins.getvals returns a vector of cut points extracted from the lst object.

See Also

binr, bins.greedy, bins.quantiles bins.optimize

Examples

## Not run: 
   # Seriously skewed x:
   x <- floor(exp(rnorm(200000 * 1.3)))
   cuts <- bins(x, target.bins = 10, minpts = 2000)
   cuts$breaks <- bins.getvals(cuts)
   cuts$binct
   #   [0, 0]    [1, 1]    [2, 2]    [3, 3]    [4, 4]    [5, 5]    [6, 7]   [8, 10]
   # 129868     66611     28039     13757      7595      4550      4623      2791
   #   [11, 199]
   # 2166

   # Centered x:
   x <- rep(c(1:10,20,31:40), c(rep(1, 10), 100, rep(1,10)))
   cuts <- bins(x, target.bins = 3, minpts = 10)
   cuts$binct
   # [1, 10] [20, 20] [31, 40]
   #      10      100       10

## End(Not run)

[Package binr version 1.1.1 Index]