bins {binr} | R Documentation |
Cut Numeric Values Into Evenly Distributed Groups (Bins)
Description
bins
- Cuts points in vector x into evenly distributed groups (bins).
bins
takes 3 separate approaches to generating the cuts, picks the one
resulting in the least mean square deviation from the ideal cut -
length(x) / target.bins
points in each bin - and then merges small bins
unless excat.groups is TRUE
The 3 approaches are:
Use quantiles, and increase the number of even cuts up to max.breaks until the number of groups reaches the desired number. See
bins.quantiles
.Start with a single bin with all the data in it and perform bin splits until either the desired number of bins is reached or there's no reduction in error (the latter is ignored if
exact.groups
isTRUE
). Seebins.split
.Start with
length(table(x))
bins, each containing exactly one distinct value and merge bins until the desired number of bins is reached. Ifexact.groups
isFALSE
, continue merging until there's no further reduction in error. Seebins.merge
.
For each of these approaches, apply redistribution of points among existing bins
until there's no further decrease in error. See bins.move
.
bins.getvals
- Extracts cut points from the object retured by bins
.
The cut points are always between the values in x
and weighed such that
the cut point splits the area under the line from (lo, n1) to (hi, n2) in half.
bins.merr
- Partitioning the data into bins using splitting, merging
and moving optimizes this error function, which is the mean squared error
of point counts in the bins relative to the optimal number of points per bin.
Usage
bins(x, target.bins, max.breaks = NA, exact.groups = F, verbose = F,
errthresh = 0.1, minpts = NA)
bins.getvals(lst, minpt = -Inf, maxpt = Inf)
bins.merr(binct, target.bins)
Arguments
x |
Vector of numbers |
target.bins |
Number of groups desired; this is also the max number of groups. |
max.breaks |
Used for initial cut. If |
exact.groups |
if TRUE, the result will have exactly the number of target.bins; if FALSE, the result may contain fewer than target.bins bins |
verbose |
Indicates verbose output. |
errthresh |
If the error is below the provided value, stops after the first rough estimate of the bins. |
minpts |
Minimum number of points in a bin.
In |
lst |
The list returned by the |
minpt |
The value replacing the lower bound of the cut points. |
maxpt |
The value replacing the upper bound of the cut points. |
binct |
The number of points falling into the bins. |
Details
The gains are computed using incremental analytical expresions derived for moving a value from one bin to the next, splitting a bin into two or merging two bins.
Value
A list containing the following items (not all of them may be present):
binlo - The "low" value falling into the bin.
binhi - The "high" value falling into the bin.
binct - The number of points falling into the bin.
xtbl - The result of a call to
table(x)
.xval - The sorted unique values of the data points x. Essentially, a numeric version of
names(xtbl)
.changed - Flag indicating whether the bins have been modified by the function.
err - Mean square root error between the resulting counts and ideal bins.
imax - For the move, merge and split operations, the index of the bin with the maximum gain.
iside - For the move operation, the side of the move: 0 = left, 1 = right.
gain - Error gain obtained as the result of the function call.
bins.getvals
returns a vector of cut points extracted from the
lst
object.
See Also
binr
, bins.greedy
, bins.quantiles
bins.optimize
Examples
## Not run:
# Seriously skewed x:
x <- floor(exp(rnorm(200000 * 1.3)))
cuts <- bins(x, target.bins = 10, minpts = 2000)
cuts$breaks <- bins.getvals(cuts)
cuts$binct
# [0, 0] [1, 1] [2, 2] [3, 3] [4, 4] [5, 5] [6, 7] [8, 10]
# 129868 66611 28039 13757 7595 4550 4623 2791
# [11, 199]
# 2166
# Centered x:
x <- rep(c(1:10,20,31:40), c(rep(1, 10), 100, rep(1,10)))
cuts <- bins(x, target.bins = 3, minpts = 10)
cuts$binct
# [1, 10] [20, 20] [31, 40]
# 10 100 10
## End(Not run)