zi_crosswalk {zippeR}R Documentation

Crosswalk ZIP Codes with UDS, HUD, or a Custom Dictionary

Description

This function compares input data containing ZIP Codes with a crosswalk file that will append ZCTAs. This is an important step because not all ZIP Codes have the same five digits as their enclosing ZCTA.

Usage

zi_crosswalk(.data, input_var, zip_source = "UDS", source_var,
    source_result, year = NULL, qtr = NULL, target = NULL, query = NULL,
    by = NULL, return_max = NULL, key = NULL, return = "id")

Arguments

.data

An "input object" that is data.frame or tibble that contains ZIP Codes to be crosswalked.

input_var

The column in the input data that contains five-digit ZIP Codes. If the input is numeric, it will be transformed to character data and leading zeros will be added.

zip_source

Required character scalar or data frame; specifies the source of ZIP Code crosswalk data. This can be one of either "UDS" (default) or "HUD", or a data frame containing a custom dictionary.

source_var

Character scalar, required when zip_source is a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZIP Codes.

source_result

Character scalar, required when zip_source is a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZCTAs, GEOIDs, or other values.

year

Optional four-digit numeric scalar for year; varies based on source. For "UDS", years 2009 through 2023 are available. For "HUD", years 2010 through 2024 are available. Does not need to be specified when a custom dictionary is used.

qtr

Numeric scalar, required when zip_code is "HUD". Integer value between 1 and 4, representing the quarter of the year.

target

Character scalar, required when zip_code is "HUD". Can be one of "TRACT", "COUNTY", "CBSA", "CBSADIV", "CD", and "COUNTYSUB".

query

Scalar or vector, required when zip_code is "HUD". This can be a five-digit numeric or character ZIP Code, a vector of ZIP Codes, a two-letter character state abbreviation, or "all".

by

Character scalar, required when zip_code is "HUD"; the column name to use for identifying the best match for a given ZIP Code. This could be either "residential", "commercial", or "total".

return_max

Logical scalar, required when zip_code is "HUD"; if TRUE (default), only the geography with the highest proportion of the ZIP Code type will be returned. If the ZIP Code straddles two states, two records will be returned. If FALSE, all records for the ZIP Code will be returned. Where a tie exists (i.e. two geographies each contain half of all addresses), the county with the lowest GEOID value will be returned.

key

Optional when zip_code is "HUD". This should be a character string containing your HUD API key. Alternatively, it can be stored in your .RProfile as hud_key.

return

Character scalar, specifies the type of output to return. Can be one of "id" (default), which appends only the crosswalked value, or "all", which returns the entire crosswalk file appended to the source data.

Value

A tibble with crosswalk values (or optionally, the full crosswalk file) appended based on the return argument.

Examples

# create sample data
df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636"))

# UDS crosswalk

  zi_crosswalk(df, input_var = zip5, zip_source = "UDS", year = 2022)


# HUD crosswalk

  zi_crosswalk(df, input_var = zip5, zip_source = "HUD", year = 2023,
    qtr = 1, target = "COUNTY", query = "MO", by = "residential",
    return_max = TRUE)


# custom dictionary
## load sample crosswalk data to simulate custom dictionary
mo_xwalk <- zi_mo_hud

# prep crosswalk
# when a ZIP Code crosses county boundaries, the portion with the largest
# number of residential addresses will be returned
mo_xwalk <- zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE)

## crosswalk
zi_crosswalk(df, input_var = zip5, zip_source = mo_xwalk, source_var = zip5,
  source_result = geoid)


[Package zippeR version 0.1.0 Index]