rsubbins {binsmooth} R Documentation

## Recursive subdivision PDF and CDF fitted to binned data

### Description

Creates a PDF and CDF based on a set of binned data, using recursive subdivision on a step function.

### Usage

```rsubbins(bEdges, bCounts, m=NULL, eps1 = 0.25, eps2 = 0.75, depth = 3,
tailShape = c("onebin", "pareto", "exponential"),
nTail=16, numIterations=20, pIndex=1.160964, tbRatio=0.8)
```

### Arguments

 `bEdges` A vector e_1, e_2, …, e_n giving the right endpoints of each bin. The value in e_n is ignored and assumed to be `Inf` or `NA`, indicating that the top bin is unbounded. The edges determine n bins on the intervals e_{i-1} ≤ x ≤ e_i, where e_0 is assumed to be 0. `bCounts` A vector c_1, c_2, …, c_n giving the counts for each bin (i.e., the number of data elements in each bin). Assumed to be nonnegative. `m` An estimate for the mean of the distribution. If no value is supplied, the mean will be estimated by (temporarily) setting e_n equal to 2e_{n-1}, and a warning message will be generated. `eps1` Parameter controlling how far the edges of the subdivided bins are shifted. Must be between 0 and 0.5. `eps2` Parameter controlling how wide the middle subdivsion of each bin should be. Must be between 0 and 1. `depth` Number of times to subdivide the bins. `tailShape` Must be one of `"onebin"`, `"pareto"`, or `"exponential"`. `nTail` The number of bins to use to form the initial tail, before recursive subdivision. Ignored if `tailShape` equals `"onebin"`. `numIterations` The number of iterations to optimize the tail to fit the mean. Ignored if `tailShape` equals `"onebin"`. `pIndex` The Pareto index for the shape of the tail. Defaults to ln(5)/ln(4). Ignored unless `tailShape` equals `"pareto"`. `tbRatio` The decay ratio for the tail bins. Ignored unless `tailShape` equals `"exponential"`.

### Details

First, a step function PDF is created, as described in `stepbins`. The bins of the resulting PDF are then recursively subdivided and shifted in a manner that preserves the area of the original bins, resulting in a step function with finer bins.

The methods `stepbins` and `rsubbins` are included in this package mainly for the purpose of comparison. For most use cases, `splinebins` will produce more accurate smoothing results.

### Value

Returns a list with the following components.

 `rsubPDF` A `stepfun` function giving the fitted PDF. `rsubCDF` A piecewise-linear `approxfun` function giving the CDF. `E` The right-hand endpoint of the support of the PDF. `shrinkFactor` If the supplied estimate for the mean is too small to be fitted with a step function, the bins edges will be scaled by `shrinkFactor`, which will be chosen less than (and close to) 1.

### Author(s)

David J. Hunter and McKalie Drown

### References

Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articles-v4-26-641/

### See Also

`stepbins`

### Examples

```# 2005 ACS data from Cook County, Illinois
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
79816,153581,195430,240948,155139,94527,92166,103217)
rsb <- rsubbins(binedges, bincounts, 76091, tailShape="pareto")

plot(rsb\$rsubPDF, do.points=FALSE)
plot(rsb\$rsubCDF, 0, rsb\$E)

library(pracma)
integral(rsb\$rsubPDF, 0, rsb\$E)
integral(function(x){1-rsb\$rsubCDF(x)}, 0, rsb\$E) #mean is approximated
```

