splinebins {binsmooth} R Documentation

Optimized spline PDF and CDF fitted to binned data

Description

Creates a smooth cubic spline CDF and piecewise-quadratic PDF based on a set of binned data (edges and counts).

Usage

splinebins(bEdges, bCounts, m = NULL,
numIterations = 16, monoMethod = c("hyman", "monoH.FC"))

Arguments

 bEdges A vector e_1, e_2, …, e_n giving the right endpoints of each bin. The value in e_n is ignored and assumed to be Inf or NA, indicating that the top bin is unbounded. The edges determine n bins on the intervals e_{i-1} ≤ x ≤ e_i, where e_0 is assumed to be 0. bCounts A vector c_1, c_2, …, c_n giving the counts for each bin (i.e., the number of data elements in each bin). Assumed to be nonnegative. m An estimate for the mean of the distribution. If no value is supplied, the mean will be estimated by (temporarily) setting e_n equal to 2e_{n-1}, and a warning message will be generated. numIterations The number of iterations performed by a binary search that optimizes the CDF to fit the mean. monoMethod The method for constructing a monotone spline. Must be one of "hyman" or "monoH.FC". The former choice tends to integrate faster and produce smoother density functions. See splinefun for more details.

Details

Fits a monotone cubic spline to the points specified by the binned data to produce a smooth cumulative distribution function. The PDF is then obtained by differentiating, so it will be piecewise quadratic and preserve the area of each bin.

Value

Returns a list with the following components.

 splinePDF A piecewise-quadratic function giving the fitted PDF. splineCDF A piecewise-cubic function giving the CDF. E The right-hand endpoint of the support of the PDF. shrinkFactor If the supplied estimate for the mean is too small to be fitted with our method, the bins edges will be scaled by shrinkFactor, which will be chosen less than (and close to) 1. splineInvCDF An approximate inverse of splineCDF. fitWarn Flag set to TRUE if the fitted median falls in the wrong bin.

Author(s)

David J. Hunter and McKalie Drown

References

Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articles-v4-26-641/

Examples

# 2005 ACS data from Cook County, Illinois
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
79816,153581,195430,240948,155139,94527,92166,103217)
sb <- stepbins(binedges, bincounts, 76091)
splb <- splinebins(binedges, bincounts, 76091)

plot(splb\$splinePDF, 0, 300000, n=500)