abs_api {readabs} | R Documentation |
ABS.Stat API functions
Description
These experimental functions provide a minimal interface to the ABS.Stat API.
More information on the ABS.Stat API can be found on the ABS website
Note that an ABS.Stat 'dataflow' is like a table. A 'datastructure' contains metadata that describes the variables in the dataflow. To load data from the ABS.Stat API, you need to either:
Using
read_api_dataflows()
you can get information on the available dataflowsUsing
read_api_datastructure()
you can get metadata relating to a specific dataflow, including the variables available in each dataflowUsing
read_api()
you can get the data belonging to a given dataflow.Using
read_api_url()
you can get the data for a given query url generated using the online data viewer.
Usage
read_api_dataflows()
read_api(
id,
datakey = NULL,
start_period = NULL,
end_period = NULL,
version = NULL
)
read_api_url(url)
read_api_datastructure(id)
Arguments
id |
A dataflow id. Use |
datakey |
A named list matching filter variables to codes. All variables
with a |
start_period |
The start period (used to filter by time). This is inclusive. The supported formats are:
|
end_period |
The end period (used to filter on time). This is inclusive.
The supported formats are the same as for |
version |
A version number, if unspecified the latest version of the
dataset is used. Use |
url |
A complete query url |
Details
Note that the API enforces a reasonably strict gateway timeout policy. This
means that, if you're trying to access a reasonably large dataset, you will
need to filter it on the server side using the datakey
. You might like to
review the data manually via the ABS website
to figure out what subset of the data you require.
Note, furthermore, that the datastructure contains a complete codebook for
the variables appearing in the relevant dataflow. Since some variables are
shared across multiple dataflows, this means that the datastructure
corresponding to a particular id
may contain values for a given variable
which are not in the corresponding dataflow.
Value
A data.frame
Examples
## Not run:
# List available dataflows
read_api_dataflows()
# Say we want the "Estimated resident population, Country of birth"
# data flow, with the id ERP_COB. We load the data like this:
# Get full data set for a given flow by providing id and start period:
read_api("ERP_COB", start_period = 2020)
# In some cases, loading a whole dataflow (as above) won't work.
# For eg., the `ABS_C16_T10_SA` dataflow is very large,
# so the gateway will timeout if we try to collect the full data set
try(read_api("ABS_C16_T10_SA"))
# We need to filter the dataflow before downlaoding it.
# To figure out how to filter it, we get metadata ('datastructure').
ds <- read_api_datastructure("ABS_C16_T10_SA")
# The `asgs_2016` code for 'Australia' is 0
ds[ds$var == "asgs_2016" & ds$label == "Australia", ]
# The `sex_abs` code for 'Persons' (i.e. all persons) is 3
ds[ds$var == "sex_abs" & ds$label == "Persons", ]
# So we have:
x <- read_api("ABS_C16_T10_SA", datakey = list(asgs_2016 = 0, sex_abs = 3))
unique(x["asgs_2016"]) # Confirming only 'Australia' level records came through
unique(x["sex_abs"]) # Confirming only 'Persons' level records came through
# Please note however that not all values in the datastructure necessarily
# appear in the data. You get 404s in this case
ds[ds$var == "regiontype" & ds$label == "Destination Zones", ]
try(read_api("ABS_C16_T10_SA", datakey = list(regiontype = "DZN")))
# If you already have a query url, then use `read_api_url()`
wpi_url <- ""https://api.data.abs.gov.au/data/ABS,WPI/all""
read_api_url(wpi_url)
## End(Not run)