R: Get Eurostat data as it is

get_eurostat_raw {restatapi}

R Documentation

Get Eurostat data as it is

Description

Download data sets from Eurostat database .

Usage

get_eurostat_raw(
  id,
  mode = "txt",
  cache = TRUE,
  update_cache = FALSE,
  cache_dir = NULL,
  compress_file = TRUE,
  stringsAsFactors = FALSE,
  keep_flags = FALSE,
  check_toc = FALSE,
  melt = TRUE,
  verbose = FALSE,
  ...
)

Arguments

`id`	A code name for the dataset of interest. See `search_eurostat_toc` for details how to get an id.
`mode`	defines the format of the downloaded dataset. It can be `txt` (the default value) for Tab Separated Values (TSV), or `csv` for SDMX-CSV, or `xml` for the SDMX-ML version.
`cache`	a logical whether to do caching. Default is `TRUE`.
`update_cache`	a logical with a default value `FALSE`, whether to update cache. Can be set also with `options(restatapi_update=TRUE)`
`cache_dir`	a path to a cache directory. The `NULL` (default) uses the memory as cache. If the folder if the `cache_dir` directory does not exist it saves in the 'restatapi' directory under the temporary directory from `tempdir()`. Directory can also be set with `option(restatapi_cache_dir=...)`.
`compress_file`	a logical whether to compress the RDS-file in caching. Default is `TRUE`.
`stringsAsFactors`	if `TRUE` the variables which are not numeric are converted to factors. The default value `FALSE`, in this case they are returned as characters.
`keep_flags`	a logical whether the observation status (flags) - e.g. "confidential", "provisional", etc. - should be kept in a separate column or if they can be removed. Default is `FALSE`. For flag values see: https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/codelist/ESTAT/OBS_STATUS/?compressed=false&format=TSV&lang=en.
`check_toc`	a boolean whether to check the provided `id` in the Table of Contents (TOC) or not. The default value `FALSE`, in this case the base URL for the download link is retrieved from the configuration file. If the value is `TRUE` then the TOC is downloaded and the `id` is checked in it. If it found then the download link is retrieved form the TOC.
`melt`	a boolean with default value `TRUE` and used only if the `mode="txt"`. In case it is `FALSE`, the downloaded tsv file is not melted, the time dimension remains in columns and it does not process the flags.
`verbose`	A boolean with default `FALSE`, so detailed messages (for debugging) will not printed. Can be set also with `options(restatapi_verbose=TRUE)`
`...`	further argument for the `load_cfg` function

Details

Data sets are downloaded from the Eurostat bulk download facility in CSV, TSV or SDMX format.

The id, should be a value from the code column of the table of contents (get_eurostat_toc), and can be searched for with the search_eurostat_toc function. The id value can be retrieved from the Eurostat database as well. The Eurostat database gives codes in the Data Navigation Tree after every dataset in parenthesis. By default all datasets downloaded in TSV format and cached as they are often rather large. The datasets cached in memory (default) or can be stored in a temporary directory if cache_dir or option(restatpi_cache_dir) is defined. The cache can be emptied with clean_restatapi_cache. If the id is checked in TOC then the data will saved in the cache with the date from the "lastUpdate" column from the TOC, otherwise it is saved with the current date.

Value

a data.table with the following columns if the default melt=TRUE is used:

`FREQ`	The frequency of the data (Annual, Semi-annual, Half-year, Quarterly, Monthly, Weekly, Daily)
dimension names	One column for each dimension in the data
`TIME_FORMAT`	A column for the time format, if the source file SDMX-ML and the data was not loaded from a previously cached TSV download (this column is missing if the source file is TSV)
`time/TIME_PERIOD`	A column for the time dimension, where the name of the column depends on the source file (TSV/SDMX-ML)
`values/OBS_VALUE`	A column for numerical values, where the name of the column depends on the source file (TSV/SDMX-ML)
`flags/OBS_STATUS`	A column for flags if the `keep_flags=TRUE` otherwise this column is not included in the data table, and the name of the column depends on the source file (TSV/SDMX-ML)

The data does not include all missing values. The missing values are dropped if the value and flags are missing on a particular time.

In case melt=FALSE the results is a data.table where the first column contains the comma separated values of the various dimensions, and the columns contains the observations for each time dimension.

Examples



if (!(grepl("amzn|-aws|-azure ",Sys.info()['release']))) options(timeout=2)
head(get_eurostat_raw("agr_r_milkpr",keep_flags=TRUE))
head(get_eurostat_raw("avia_par_ee",mode="xml",check_toc=TRUE,update_cache=TRUE,verbose=TRUE))
options(restatapi_update=FALSE)
head(get_eurostat_raw("avia_par_me",mode="txt",melt=FALSE))
head(get_eurostat_raw("avia_par_me",
                      mode="txt",
                      cache_dir=tempdir(),
                      compress_file=FALSE,
                      verbose=TRUE))
options(timeout=60)

[Package restatapi version 0.23.1 Index]