splinebins {binsmooth} R Documentation

## Optimized spline PDF and CDF fitted to binned data

### Description

Creates a smooth cubic spline CDF and piecewise-quadratic PDF based on a set of binned data (edges and counts).

### Usage

```splinebins(bEdges, bCounts, m = NULL,
numIterations = 16, monoMethod = c("hyman", "monoH.FC"))
```

### Arguments

 `bEdges` A vector e_1, e_2, …, e_n giving the right endpoints of each bin. The value in e_n is ignored and assumed to be `Inf` or `NA`, indicating that the top bin is unbounded. The edges determine n bins on the intervals e_{i-1} ≤ x ≤ e_i, where e_0 is assumed to be 0. `bCounts` A vector c_1, c_2, …, c_n giving the counts for each bin (i.e., the number of data elements in each bin). Assumed to be nonnegative. `m` An estimate for the mean of the distribution. If no value is supplied, the mean will be estimated by (temporarily) setting e_n equal to 2e_{n-1}, and a warning message will be generated. `numIterations` The number of iterations performed by a binary search that optimizes the CDF to fit the mean. `monoMethod` The method for constructing a monotone spline. Must be one of `"hyman"` or `"monoH.FC"`. The former choice tends to integrate faster and produce smoother density functions. See `splinefun` for more details.

### Details

Fits a monotone cubic spline to the points specified by the binned data to produce a smooth cumulative distribution function. The PDF is then obtained by differentiating, so it will be piecewise quadratic and preserve the area of each bin.

### Value

Returns a list with the following components.

 `splinePDF` A piecewise-quadratic function giving the fitted PDF. `splineCDF` A piecewise-cubic function giving the CDF. `E` The right-hand endpoint of the support of the PDF. `shrinkFactor` If the supplied estimate for the mean is too small to be fitted with our method, the bins edges will be scaled by `shrinkFactor`, which will be chosen less than (and close to) 1. `splineInvCDF` An approximate inverse of `splineCDF`. `fitWarn` Flag set to `TRUE` if the fitted median falls in the wrong bin.

### Author(s)

David J. Hunter and McKalie Drown

### References

Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articles-v4-26-641/

### Examples

```# 2005 ACS data from Cook County, Illinois
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
79816,153581,195430,240948,155139,94527,92166,103217)
sb <- stepbins(binedges, bincounts, 76091)
splb <- splinebins(binedges, bincounts, 76091)

plot(splb\$splinePDF, 0, 300000, n=500)
plot(sb\$stepPDF, do.points=FALSE, col="gray", add=TRUE)
# notice that the curve preserves bin area

library(pracma)
integral(splb\$splinePDF, 0, splb\$E)
integral(function(x){1-splb\$splineCDF(x)}, 0, splb\$E) # should be the mean
splb <- splinebins(binedges, bincounts, 76091, numIterations=20)
integral(function(x){1-splb\$splineCDF(x)}, 0, splb\$E) # closer to given mean
```

[Package binsmooth version 0.2.2 Index]