get_metadata_nhgis {ipumsr}R Documentation

List available data sources from IPUMS NHGIS

Description

Retrieve information about available NHGIS data sources, including datasets, data tables (summary tables), time series tables, and shapefiles (GIS files).

To retrieve summary metadata for all available data sources of a particular type, use the type argument. To retrieve detailed metadata for a single data source, use the dataset, data_table, or time_series_table argument. See the metadata availability section below for information on the metadata provided for each data type.

For general information, see the NHGIS data source overview and the FAQ.

Learn more about the IPUMS API in vignette("ipums-api") and NHGIS extract definitions in vignette("ipums-api-nhgis").

Usage

get_metadata_nhgis(
  type = NULL,
  dataset = NULL,
  data_table = NULL,
  time_series_table = NULL,
  delay = 0,
  api_key = Sys.getenv("IPUMS_API_KEY")
)

Arguments

type

One of "datasets", "data_tables", "time_series_tables", or "shapefiles" indicating the type of summary metadata to retrieve. Leave NULL if requesting metadata for a single dataset, data_table, or time_series_table.

dataset

Name of an individual dataset for which to retrieve metadata.

data_table

Name of an individual data table for which to retrieve metadata. If provided, an associated dataset must also be specified.

time_series_table

Name of an individual time series table for which to retrieve metadata.

delay

Number of seconds to delay between successive API requests, if multiple requests are needed to retrieve all records.

A delay is highly unlikely to be necessary and is intended only as a fallback in the event that you cannot retrieve all metadata records without exceeding the API rate limit.

Only used if type is provided.

api_key

API key associated with your user account. Defaults to the value of the IPUMS_API_KEY environment variable. See set_ipums_api_key().

Value

If type is provided, a tibble of summary metadata for all data sources of the provided type. Otherwise, a named list of metadata for the specified dataset, data_table, or time_series_table.

Metadata availability

The following sections summarize the metadata fields provided for each data type. Summary metadata include a subset of the fields provided for individual data sources.

Datasets:

Data tables:

Time series tables:

Shapefiles:

See Also

define_extract_nhgis() to create an IPUMS NHGIS extract definition.

Examples

## Not run: 
library(dplyr)

# Get summary metadata for all available sources of a given data type
get_metadata_nhgis("datasets")

# Filter to identify data sources of interest by their metadata values
all_tsts <- get_metadata_nhgis("time_series_tables")

tsts <- all_tsts %>%
  filter(
    grepl("Children", description),
    grepl("Families", description),
    geographic_integration == "Standardized to 2010"
  )

tsts$name

# Get detailed metadata for a single source with its associated argument:
cs5_meta <- get_metadata_nhgis(time_series_table = "CS5")
cs5_meta$geog_levels

# Use the available values when defining an NHGIS extract request
define_extract_nhgis(
  time_series_tables = tst_spec("CS5", geog_levels = "state")
)

# Detailed metadata is also provided for datasets and data tables
get_metadata_nhgis(dataset = "1990_STF1")
get_metadata_nhgis(data_table = "NP1", dataset = "1990_STF1")

# Iterate over data sources to retrieve detailed metadata for several
# records. For instance, to get variable metadata for a set of data tables:
tables <- c("NP1", "NP2", "NP10")

var_meta <- purrr::map(
  tables,
  function(dt) {
    dt_meta <- get_metadata_nhgis(dataset = "1990_STF1", data_table = dt)

    # This ensures you avoid hitting rate limit for large numbers of tables
    Sys.sleep(1)

    dt_meta$variables
  }
)

## End(Not run)

[Package ipumsr version 0.7.2 Index]