splinebins {binsmooth} | R Documentation |
Creates a smooth cubic spline CDF and piecewise-quadratic PDF based on a set of binned data (edges and counts).
splinebins(bEdges, bCounts, m = NULL, numIterations = 16, monoMethod = c("hyman", "monoH.FC"))
bEdges |
A vector e_1, e_2, …, e_n giving the right endpoints of each bin. The value in e_n is ignored and assumed to be |
bCounts |
A vector c_1, c_2, …, c_n giving the counts for each bin (i.e., the number of data elements in each bin). Assumed to be nonnegative. |
m |
An estimate for the mean of the distribution. If no value is supplied, the mean will be estimated by (temporarily) setting e_n equal to 2e_{n-1}, and a warning message will be generated. |
numIterations |
The number of iterations performed by a binary search that optimizes the CDF to fit the mean. |
monoMethod |
The method for constructing a monotone spline. Must be one of |
Fits a monotone cubic spline to the points specified by the binned data to produce a smooth cumulative distribution function. The PDF is then obtained by differentiating, so it will be piecewise quadratic and preserve the area of each bin.
Returns a list with the following components.
splinePDF |
A piecewise-quadratic function giving the fitted PDF. |
splineCDF |
A piecewise-cubic function giving the CDF. |
E |
The right-hand endpoint of the support of the PDF. |
shrinkFactor |
If the supplied estimate for the mean is too small to be fitted with our method, the bins edges will be scaled by |
splineInvCDF |
An approximate inverse of |
fitWarn |
Flag set to |
David J. Hunter and McKalie Drown
Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articles-v4-26-641/
# 2005 ACS data from Cook County, Illinois binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000, 50000,60000,75000,100000,125000,150000,200000,NA) bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481, 79816,153581,195430,240948,155139,94527,92166,103217) sb <- stepbins(binedges, bincounts, 76091) splb <- splinebins(binedges, bincounts, 76091) plot(splb$splinePDF, 0, 300000, n=500) plot(sb$stepPDF, do.points=FALSE, col="gray", add=TRUE) # notice that the curve preserves bin area library(pracma) integral(splb$splinePDF, 0, splb$E) integral(function(x){1-splb$splineCDF(x)}, 0, splb$E) # should be the mean splb <- splinebins(binedges, bincounts, 76091, numIterations=20) integral(function(x){1-splb$splineCDF(x)}, 0, splb$E) # closer to given mean