derive_synth_datasets {synthACS} | R Documentation |
Derive synthetic micro datasets for a given geography.
Description
Derive synthetic micro datasets for each sub-geography of a given set of geographic
macro data constraining tabulations. See Details... By default, micro dataset generation is run
in parallel with load balancing. Macro data is assumed to have been pulled from the US Census API
via the acs
package.
Usage
derive_synth_datasets(macro_data, parallel = TRUE, leave_cores = 2)
Arguments
macro_data |
A macro dataset list: the result of |
parallel |
Logical, defaults to |
leave_cores |
How many cores do you wish to leave open to other processing? |
Value
A list
of the input macro datasets produced by
pull_synth_data
and a list
of synthetic micro datasets for each geographical
subset within the specified macro geography.
Details
In the absence of true micro level datasets for a given geographic area, synthetic datasets
can be used. This function uses conditional and marginal probability distributions (at the
aggregate level) to generate synthetic micro population datasets, which are built one constraint
at a time. Taking as input the macro level data (class "macroACS"
), this function builds
synthetic micro datasets for each lower level geographical area within the area of study.
In simplest terms, the goal is to generate a joint probability distribution for an attribute vector; and, to create synthetic individuals from this distribution. However, note that information for the full joint distribution is typically not available, so we construct it as a product of conditional and marginal probabilities. This is done one attribute at a time; where it is assumed that there is some sort of continuum of attribute dependence. That is, some attributes are more important (eg. gender, age) in 'determining' others (eg. educational attainment, marital status, etc). These more important attributes need to be assigned first, whereas less important attributes may be assigned later. Most of these distinctions are largely intuitive, but care must be taken in choosing the order of constructed attributes.
This function provides a synthetic population with the following characteristics as well as each
synthetic individual's probability of inclusion. The included characteristics are: age, gender,
marital status, educational attainment, employment status, nativity, poverty status, geographic
mobility in the prior year, individual income, and race. Additional attributes which interest the
user may be added in a similar manner via synthetic_new_attribute
.
**Note:** INDIVIDUAL, not HOUSEHOLD level, synthetic population datasets are created.
References
Birkin, Mark, and M. Clarke. "SYNTHESIS-a synthetic spatial information system for urban and regional analysis: methods and examples." Environment and planning A 20.12 (1988): 1645-1671.
See Also
pull_synth_data
, acs.fetch
, geo.make
Examples
## Not run:
# make geography
la_geo <- acs::geo.make(state= "CA", county= "Los Angeles", tract= "*")
# pull data elements for creating synthetic data
la_dat <- pull_synth_data(2014, 5, la_geo)
# derive synthetic data
la_synthetic <- derive_synth_datasets(la_dat, leave_cores= 0)
## End(Not run)