synthetic_population {sociome} | R Documentation |
Create a synthetic population simulating US Census areas
Description
Returns a data set of synthetic individuals based on user-specified US Census areas. The age, sex, race, and ethnicity of each individual is probabilistic, based on the demographics of the areas as reported in a user-specified US Census data set.
Usage
synthetic_population(
geography,
state = NULL,
county = NULL,
geoid = NULL,
zcta = NULL,
year,
dataset = c("acs5", "acs3", "acs1", "decennial"),
geometry = FALSE,
cache_tables = TRUE,
max_age = 115,
rate = 0.25,
key = NULL,
seed = NULL,
...
)
Arguments
geography |
A character string denoting the level of US census geography at which you want to create a synthetic population. Required. |
state |
A character string specifying states whose population you want
to synthesize. Defaults to |
county |
A vector of character strings specifying the counties whose
population you want to synthesize. Defaults to |
geoid |
A character vector of GEOIDs (use quotation marks and leading
zeros). Defaults to |
zcta |
A character vector of ZCTAs or the leading digit(s) of ZCTAs (use
quotation marks and leading zeros). Defaults to Strings under 5 digits long will yield all ZCTAs that begin with those digits. Requires that |
year , dataset |
Specifies the US Census data set on which to base the demographic profile of your synthetic population.
When Important: data are not always available depending on the level of geography and data set chosen. See https://www.census.gov/programs-surveys/acs/guidance/estimates.html. |
geometry |
Logical value indicating whether or not shapefile data should
be included in the result, making the result an The shapefile data that is returned is somewhat customizable by passing
certain arguments along to the |
cache_tables |
The plural version of the |
max_age |
A single integer representing the largest possible age that can appear in the data set. Simulated age values exceeding this value will be top-coded to this value. Defaults to 115. See details. |
rate |
A single number, passed to |
key |
Your Census API key as a character string. Obtain one at
http://api.census.gov/data/key_signup.html. Defaults to |
seed |
Passed onto |
... |
Additional arguments to be passed onto This may be found to be helpful when setting |
Details
Returns a tibble
or sf
object where each row
represents a synthetic person. Each person has an age, sex, race, and
ethnicity. The probability of what each person's age/sex/race/ethnicity will
be is equal to the proportions in their census area as reported in the
user-specified US Census data set (e.g., 2010 Decennial Census or 2017 ACS
5-year estimates). The number of rows in the data set will equal the number
of people living in the user-specified US Census areas, as reported in the
same US Census data set.
Value
If geometry = FALSE
, (the default) a tibble
. If
geometry = TRUE
is specified, an sf
.
Synthesizing ages from US Census Data
US Census data provides
counts of the number of people in different age brackets of varying widths.
The age_lo
and age_hi
columns in the output depict the age bracket of
each individual in the synthetic population. There is also an age
column
that probabilistically generates a non-whole-number age within the age
bracket. A uniform distribution (via stats::runif()
) guides this age
generation for all age brackets except the highest age bracket ("age 85 and
over" in the extant ACS and Decennial Census data). An exponential
distribution (via stats::rexp()
) guides the age generation for this
highest age bracket, and the user can specify rate
to customize the
exponential distribution that is used.
Examples
## Not run:
# Wrapped in \dontrun{} because all these examples take >5 seconds
# and require a Census API key.
# Synthetic population for Utah, using the 2019 ACS 5-year estimates:
synthetic_population(geography = "state", state = "UT", year = 2019)
# Same, but make it so that survival past age 85 is highly unlikely
# (via rate = 10), and so that 87 is the maximum possible age
synthetic_population(
geography = "state",
state = "UT",
year = 2019,
max_age = 87,
rate = 10
)
# Synthetic population of the Delmarva Peninsula at the census tract level,
# using 2000 Decennial Census data
synthetic_population(
geography = "tract",
geoid =
# This two-digit GEOID is the state of Delaware.
c("10",
# These five-digit GEOIDs are specific counties in Virginia and Maryland
"51001", "51131", "24015", "24029", "24035", "24011", "24041", "24019",
"24045", "24039", "24047"),
year = 2000,
dataset = "decennial"
)
## End(Not run)