simcounty {binsmooth} | R Documentation |
Simulate data to mimic county_bins
and county_true
Description
Samples from a selection of distributions (Gamma, Lognormal, Weibull, Triangle) to simulate income data in the
format used in the American Community Survey data (county_bins
and county_true
).
Usage
simcounty(numCounties, minPop = 1000, maxPop = 100000,
bin_minimums = c(0, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000,
50000, 60000, 75000, 100000, 125000, 150000, 200000))
Arguments
numCounties |
The number of counties to simulate data for |
minPop |
Minimum population to sample (default = 1000) |
maxPop |
Maximum population to sample (default = 100000) |
bin_minimums |
Bin edges. Defaults to the edges used in the Census data. |
Details
The county names will tell which distributions were sampled to simulate each county.
Value
Returns a list of two data frames:
county_bins |
Simulated binned income data |
county_true |
Statistics computed from the raw data |
Author(s)
David J. Hunter and McKalie Drown
References
Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articles-v4-26-641/
See Also
Examples
l1 <- simcounty(5)
cb <- l1$county_bins
ct <- l1$county_true
sbl <- splinebins(cb$bin_max[cb$fips==103], cb$households[cb$fips==103],
ct$mean_true[ct$fips==103])
stl <- stepbins(cb$bin_max[cb$fips==105], cb$households[cb$fips==105],
ct$mean_true[ct$fips==105])
plot(sbl$splinePDF, 0, 300000, n=500)
plot(stl$stepPDF, do.points=FALSE, main=cb$county[cb$fips==105][1])
## Simulate one county and estimate gini and theil from binned data
l2 <- simcounty(1)
binedges <- l2$county_bins$bin_max + 0.5 # continuity correction
bincounts <- l2$county_bins$households
splinefit <- splinebins(binedges, bincounts, l2$county_true$mean_true)
gini(splinefit)
theil(splinefit)
l2$county_true