splinebins {binsmooth} | R Documentation |
Optimized spline PDF and CDF fitted to binned data
Description
Creates a smooth cubic spline CDF and piecewise-quadratic PDF based on a set of binned data (edges and counts).
Usage
splinebins(bEdges, bCounts, m = NULL,
numIterations = 16, monoMethod = c("hyman", "monoH.FC"))
Arguments
bEdges |
A vector |
bCounts |
A vector |
m |
An estimate for the mean of the distribution. If no value is supplied, the mean will be estimated by (temporarily) setting |
numIterations |
The number of iterations performed by a binary search that optimizes the CDF to fit the mean. |
monoMethod |
The method for constructing a monotone spline. Must be one of |
Details
Fits a monotone cubic spline to the points specified by the binned data to produce a smooth cumulative distribution function. The PDF is then obtained by differentiating, so it will be piecewise quadratic and preserve the area of each bin.
Value
Returns a list with the following components.
splinePDF |
A piecewise-quadratic function giving the fitted PDF. |
splineCDF |
A piecewise-cubic function giving the CDF. |
E |
The right-hand endpoint of the support of the PDF. |
shrinkFactor |
If the supplied estimate for the mean is too small to be fitted with our method, the bins edges will be scaled by |
splineInvCDF |
An approximate inverse of |
fitWarn |
Flag set to |
Author(s)
David J. Hunter and McKalie Drown
References
Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articles-v4-26-641/
Examples
# 2005 ACS data from Cook County, Illinois
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
79816,153581,195430,240948,155139,94527,92166,103217)
sb <- stepbins(binedges, bincounts, 76091)
splb <- splinebins(binedges, bincounts, 76091)
plot(splb$splinePDF, 0, 300000, n=500)
plot(sb$stepPDF, do.points=FALSE, col="gray", add=TRUE)
# notice that the curve preserves bin area
library(pracma)
integral(splb$splinePDF, 0, splb$E)
integral(function(x){1-splb$splineCDF(x)}, 0, splb$E) # should be the mean
splb <- splinebins(binedges, bincounts, 76091, numIterations=20)
integral(function(x){1-splb$splineCDF(x)}, 0, splb$E) # closer to given mean