get_eurostat_bulk {restatapi} | R Documentation |
Get Eurostat data in a standardized format
Description
Download data sets from Eurostat database and put in a standardized format.
Usage
get_eurostat_bulk(
id,
cache = TRUE,
update_cache = FALSE,
cache_dir = NULL,
compress_file = TRUE,
stringsAsFactors = TRUE,
select_freq = NULL,
keep_flags = FALSE,
cflags = FALSE,
check_toc = FALSE,
verbose = FALSE,
...
)
Arguments
id |
a code name for the dataset of interest.
See |
cache |
a logical value whether to do caching. Default is |
update_cache |
a logical value with a default value |
cache_dir |
a path to a cache directory. The |
compress_file |
a logical value whether to compress the
RDS-file in caching. Default is |
stringsAsFactors |
a logical value with the default |
select_freq |
a character symbol for a time frequency when a dataset has multiple time
frequencies. Possible values are:
A = annual, S = semi-annual, H = half-year, Q = quarterly, M = monthly, W = weekly, D = daily.
The default is |
keep_flags |
a logical value whether the observation status (flags) - e.g. "confidential",
"provisional", etc. - should be kept in a separate column or if they
can be removed. Default is |
cflags |
a logical value whether the missing observations with flag 'c' - "confidential"
should be kept or not. Default is |
check_toc |
a logical value whether to check the provided |
verbose |
a logical value with default |
... |
other parameter(s) to pass on the |
Details
Data sets are downloaded from the Eurostat bulk download facility in TSV format as in this case smaller file has to be downloaded and processed. If there is more then one frequency then the dataset is filtered for a unique time frequency. If no frequency is selected and there are multiple frequencies in the dataset, then the most common value is used used for frequency.
Compared to the ouptut of the get_eurostat_raw
function, the frequency (FREQ) and time format (TIME_FORMAT) columns are not included in the bulk data
and the column names for the time period, observation values and status have standardised names: "time", "values" and "flags"
independently if the data was downloaded previously in SDMX or TSV format.
By default all datasets cached as they are often rather large.
The datasets cached in memory (default) or can be stored in a temporary directory if cache_dir
or option(restatpi_cache_dir)
is defined.
The cache can be emptied with clean_restatapi_cache
.
The id
, is a value from the code
column of the table of contents (get_eurostat_toc
), and can be searched for it with the search_eurostat_toc
function. The id value can be retrieved from the Eurostat database
as well. The Eurostat database gives codes in the Data Navigation Tree after every dataset
in parenthesis.
Value
a data.table with the following columns:
dimension names | One column for each dimension in the data |
time | A column for the time dimension |
values | A column for numerical values |
flags | A column for flags if the keep_flags=TRUE or cflags=TRUE otherwise this column
is not included in the data table
|
The data.table does not include all missing values. The missing values are dropped if both the value and the flag is missing on a particular time.
See Also
get_eurostat_data
, get_eurostat_raw
Examples
if (!(grepl("amzn|-aws|-azure ",Sys.info()['release']))) options(timeout=2)
head(get_eurostat_bulk("agr_r_milkpr",keep_flags=TRUE))
options(restatapi_update=TRUE)
head(get_eurostat_bulk("avia_par_ee",check_toc=TRUE))
head(get_eurostat_bulk("avia_par_ee",select_freq="A",verbose=TRUE))
options(restatapi_update=FALSE)
head(get_eurostat_bulk("agr_r_milkpr",cache_dir=tempdir(),compress_file=FALSE,verbose=TRUE))
clean_restatapi_cache(cache_dir=tempdir(),verbose=TRUE)
options(timeout=60)