stepbins {binsmooth} | R Documentation |
Step function PDF and CDF fitted to binned data
Description
Creates a step function PDF and CDF based on a set of binned data (edges and counts).
Usage
stepbins(bEdges, bCounts, m = NULL,
tailShape = c("onebin", "pareto", "exponential"),
nTail = 16, numIterations = 20, pIndex = 1.160964, tbRatio = 0.8)
Arguments
bEdges |
A vector |
bCounts |
A vector |
m |
An estimate for the mean of the distribution. If no value is supplied, the mean will be estimated by (temporarily) setting |
tailShape |
Must be one of |
nTail |
The number of bins to use to form the tail. Ignored if |
numIterations |
The number of iterations to optimize the tail to fit the mean. Ignored if
|
pIndex |
The Pareto index for the shape of the tail. Defaults to |
tbRatio |
The decay ratio for the tail bins. Ignored unless |
Details
We assume that the left endpoint of the first bin is 0 and that the top bin is unbounded. Options exist to replace the top bin with a single bin or a sequence of bins in the shape of a Pareto or exponential tail. The density functions will fit a supplied estimate for the population mean, if supplied.
The methods stepbins
and rsubbins
are included in this package mainly for the purpose of comparison. For most use cases, splinebins
will produce more accurate smoothing results.
Value
Returns a list with the following components.
stepPDF |
A |
stepCDF |
A piecewise-linear |
E |
The right-hand endpoint of the support of the PDF. |
shrinkFactor |
If the supplied estimate for the mean is too small to be fitted with a step function, the bins edges will be scaled by |
Author(s)
David J. Hunter and McKalie Drown
References
Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articles-v4-26-641/
Examples
# 2005 ACS data from Cook County, Illinois
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
79816,153581,195430,240948,155139,94527,92166,103217)
sb <- stepbins(binedges, bincounts, 76091)
sbpt <- stepbins(binedges, bincounts, 76091, tailShape="pareto")
plot(sb$stepPDF)
plot(sbpt$stepPDF, do.points=FALSE)
plot(sb$stepCDF, 0, sb$E+100000)
library(pracma)
integral(sb$stepPDF, 0, sb$E) # should be approximately 1
integral(function(x){1-sb$stepCDF(x)}, 0, sb$E) # should be the mean