stepbins {binsmooth} R Documentation

Step function PDF and CDF fitted to binned data

Description

Creates a step function PDF and CDF based on a set of binned data (edges and counts).

Usage

stepbins(bEdges, bCounts, m = NULL,
tailShape = c("onebin", "pareto", "exponential"),
nTail = 16, numIterations = 20, pIndex = 1.160964, tbRatio = 0.8)

Arguments

 bEdges A vector e_1, e_2, …, e_n giving the right endpoints of each bin. The value in e_n is ignored and assumed to be Inf or NA, indicating that the top bin is unbounded. The edges determine n bins on the intervals e_{i-1} ≤ x ≤ e_i, where e_0 is assumed to be 0. bCounts A vector c_1, c_2, …, c_n giving the counts for each bin (i.e., the number of data elements in each bin). Assumed to be nonnegative. m An estimate for the mean of the distribution. If no value is supplied, the mean will be estimated by (temporarily) setting e_n equal to 2e_{n-1}, and a warning message will be generated. tailShape Must be one of "onebin", "pareto", or "exponential". nTail The number of bins to use to form the tail. Ignored if tailShape equals "onebin". numIterations The number of iterations to optimize the tail to fit the mean. Ignored if tailShape equals "onebin". pIndex The Pareto index for the shape of the tail. Defaults to ln(5)/ln(4). Ignored unless tailShape equals "pareto". tbRatio The decay ratio for the tail bins. Ignored unless tailShape equals "exponential".

Details

We assume that the left endpoint of the first bin is 0 and that the top bin is unbounded. Options exist to replace the top bin with a single bin or a sequence of bins in the shape of a Pareto or exponential tail. The density functions will fit a supplied estimate for the population mean, if supplied.

The methods stepbins and rsubbins are included in this package mainly for the purpose of comparison. For most use cases, splinebins will produce more accurate smoothing results.

Value

Returns a list with the following components.

 stepPDF A stepfun function giving the fitted PDF. stepCDF A piecewise-linear approxfun function giving the CDF. E The right-hand endpoint of the support of the PDF. shrinkFactor If the supplied estimate for the mean is too small to be fitted with a step function, the bins edges will be scaled by shrinkFactor, which will be chosen less than (and close to) 1.

Author(s)

David J. Hunter and McKalie Drown

References

Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articles-v4-26-641/

Examples

# 2005 ACS data from Cook County, Illinois
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
79816,153581,195430,240948,155139,94527,92166,103217)
sb <- stepbins(binedges, bincounts, 76091)
sbpt <- stepbins(binedges, bincounts, 76091, tailShape="pareto")

plot(sb\$stepPDF)
plot(sbpt\$stepPDF, do.points=FALSE)
plot(sb\$stepCDF, 0, sb\$E+100000)

library(pracma)
integral(sb\$stepPDF, 0, sb\$E) # should be approximately 1
integral(function(x){1-sb\$stepCDF(x)}, 0, sb\$E) # should be the mean

[Package binsmooth version 0.2.2 Index]