Creates a PDF and CDF based on a set of binned data, using recursive subdivision on a step function.
rsubbins(bEdges, bCounts, m=NULL, eps1 = 0.25, eps2 = 0.75, depth = 3, tailShape = c("onebin", "pareto", "exponential"), nTail=16, numIterations=20, pIndex=1.160964, tbRatio=0.8)
bEdges 
A vector e_1, e_2, …, e_n giving the right endpoints of each bin. The value in e_n is ignored and assumed to be 
bCounts 
A vector c_1, c_2, …, c_n giving the counts for each bin (i.e., the number of data elements in each bin). Assumed to be nonnegative. 
m 
An estimate for the mean of the distribution. If no value is supplied, the mean will be estimated by (temporarily) setting e_n equal to 2e_{n1}, and a warning message will be generated. 
eps1 
Parameter controlling how far the edges of the subdivided bins are shifted. Must be between 0 and 0.5. 
eps2 
Parameter controlling how wide the middle subdivsion of each bin should be. Must be between 0 and 1. 
depth 
Number of times to subdivide the bins. 
tailShape 
Must be one of 
nTail 
The number of bins to use to form the initial tail, before recursive subdivision.
Ignored if 
numIterations 
The number of iterations to optimize the tail to fit the mean. Ignored if

pIndex 
The Pareto index for the shape of the tail. Defaults to ln(5)/ln(4).
Ignored unless 
tbRatio 
The decay ratio for the tail bins. Ignored unless 
First, a step function PDF is created, as described in stepbins
. The bins of the resulting PDF are then recursively subdivided and shifted in a manner that preserves the area of the original bins, resulting in a step function with finer bins.
The methods stepbins
and rsubbins
are included in this package mainly for the purpose of comparison. For most use cases, splinebins
will produce more accurate smoothing results.
Returns a list with the following components.
rsubPDF 
A 
rsubCDF 
A piecewiselinear 
E 
The righthand endpoint of the support of the PDF. 
shrinkFactor 
If the supplied estimate for the mean is too small to be fitted with a step function, the bins edges will be scaled by 
David J. Hunter and McKalie Drown
Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and MeanMatching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articlesv426641/
# 2005 ACS data from Cook County, Illinois binedges < c(10000,15000,20000,25000,30000,35000,40000,45000, 50000,60000,75000,100000,125000,150000,200000,NA) bincounts < c(157532,97369,102673,100888,90835,94191,87688,90481, 79816,153581,195430,240948,155139,94527,92166,103217) rsb < rsubbins(binedges, bincounts, 76091, tailShape="pareto") plot(rsb$rsubPDF, do.points=FALSE) plot(rsb$rsubCDF, 0, rsb$E) library(pracma) integral(rsb$rsubPDF, 0, rsb$E) integral(function(x){1rsb$rsubCDF(x)}, 0, rsb$E) #mean is approximated