predict_race {wru} | R Documentation |
Race prediction function.
Description
predict_race
makes probabilistic estimates of individual-level race/ethnicity.
Usage
predict_race(
voter.file,
census.surname = TRUE,
surname.only = FALSE,
census.geo = c("tract", "block", "block_group", "county", "place", "zcta"),
census.key = Sys.getenv("CENSUS_API_KEY"),
census.data = NULL,
age = FALSE,
sex = FALSE,
year = "2020",
party = NULL,
retry = 3,
impute.missing = TRUE,
skip_bad_geos = FALSE,
use.counties = FALSE,
model = "BISG",
race.init = NULL,
name.dictionaries = NULL,
names.to.use = "surname",
control = NULL
)
Arguments
voter.file |
An object of class |
census.surname |
A |
surname.only |
A |
census.geo |
An optional character vector specifying what level of
geography to use to merge in U.S. Census geographic data. Currently
|
census.key |
A character object specifying user's Census API key.
Required if If |
census.data |
A list indexed by two-letter state abbreviations,
which contains pre-saved Census geographic data.
Can be generated using |
age |
An optional |
sex |
optional |
year |
An optional character vector specifying the year of U.S. Census geographic
data to be downloaded. Use |
party |
An optional character object specifying party registration field
in |
retry |
The number of retries at the census website if network interruption occurs. |
impute.missing |
Logical, defaults to TRUE. Should missing be imputed? |
skip_bad_geos |
Logical. Option to have the function skip any geolocations that are not present
in the census data, returning a partial data set. Default is set to |
use.counties |
A logical, defaulting to FALSE. Should census data be filtered by counties available in census.data? |
model |
Character string, either "BISG" (default) or "fBISG" (for error-correction, fully-Bayesian model). |
race.init |
Vector of initial race for each observation in voter.file.
Must be an integer vector, with 1=white, 2=black, 3=hispanic, 4=asian, and
5=other. Defaults to values obtained using |
name.dictionaries |
Optional named list of |
names.to.use |
One of 'surname', 'surname, first', or 'surname, first, middle'. Defaults to 'surname'. |
control |
List of control arguments only used when
|
Details
This function implements the Bayesian race prediction methods outlined in Imai and Khanna (2015). The function produces probabilistic estimates of individual-level race/ethnicity, based on surname, geolocation, and party.
Value
Output will be an object of class data.frame
. It will
consist of the original user-input voter.file
with additional columns with
predicted probabilities for each of the five major racial categories:
pred.whi
for White,
pred.bla
for Black,
pred.his
for Hispanic/Latino,
pred.asi
for Asian/Pacific Islander, and
pred.oth
for Other/Mixed.
Examples
#' data(voters)
try(predict_race(voter.file = voters, surname.only = TRUE))
## Not run:
try(predict_race(voter.file = voters, census.geo = "tract"))
## End(Not run)
## Not run:
try(predict_race(
voter.file = voters, census.geo = "place", year = "2020"))
## End(Not run)
## Not run:
CensusObj <- try(get_census_data(state = c("NY", "DC", "NJ")))
try(predict_race(
voter.file = voters, census.geo = "tract", census.data = CensusObj, party = "PID")
)
## End(Not run)
## Not run:
CensusObj2 <- try(get_census_data(state = c("NY", "DC", "NJ"), age = T, sex = T))
try(predict_race(
voter.file = voters, census.geo = "tract", census.data = CensusObj2, age = T, sex = T))
## End(Not run)
## Not run:
CensusObj3 <- try(get_census_data(state = c("NY", "DC", "NJ"), census.geo = "place"))
try(predict_race(voter.file = voters, census.geo = "place", census.data = CensusObj3))
## End(Not run)