get_eurostat_data {restatapi} | R Documentation |
Download, extract and filter Eurostat data
Description
Download full or partial data set from Eurostat database.
Usage
get_eurostat_data(
id,
filters = NULL,
lang = "en",
exact_match = TRUE,
date_filter = NULL,
label = FALSE,
select_freq = NULL,
cache = TRUE,
update_cache = FALSE,
cache_dir = NULL,
compress_file = TRUE,
stringsAsFactors = TRUE,
keep_flags = FALSE,
cflags = FALSE,
check_toc = FALSE,
local_filter = TRUE,
force_local_filter = FALSE,
mode = "xml",
verbose = FALSE,
...
)
Arguments
id |
A code name for the dataset of interest.
See |
filters |
a string, a character vector or named list containing words to filter by the different concepts or geographical location.
If filter applied only part of the dataset is downloaded through the API. The words can be
any word, Eurostat variable code, and value which are in the DSD |
lang |
a character string either |
exact_match |
a boolean with the default value |
date_filter |
a vector which can be numeric or character containing dates to filter the dataset. If date is defined as character string it should follow the format yyyy[-mm][-dd], where the month and the day part is optional.
If date filter applied only part of the dataset is downloaded through the API.
The default is |
label |
a boolean with the default |
select_freq |
a character symbol for a time frequency when a dataset has multiple time
frequencies. Possible values are:
A = annual, S = semi-annual, H = half-year, Q = quarterly, M = monthly, W = weekly, D = daily.
The default is |
cache |
a logical whether to do caching. Default is |
update_cache |
a logical with a default value |
cache_dir |
a path to a cache directory. The |
compress_file |
a logical whether to compress the
RDS-file in caching. Default is |
stringsAsFactors |
if |
keep_flags |
a logical whether the observation status (flags) - e.g. "confidential",
"provisional", etc. - should be kept in a separate column or if they
can be removed. Default is |
cflags |
a logical whether the missing observations with flag 'c' - "confidential"
should be kept or not. Default is |
check_toc |
a boolean whether to check the provided |
local_filter |
a boolean whether do the filtering on the local computer or not in case after filtering still the dataset has more observations
than the limit per query via the API would allow to download. The default is |
force_local_filter |
a boolean with the default value |
mode |
defines the format of the dataset response from the API. It can be
|
verbose |
A boolean with default |
... |
further arguments to the for |
Details
Data sets are downloaded from the Eurostat Web Services
SDMX API if there is a filter otherwise the
the Eurostat bulk download facility is used.
If only the table id
is given, the whole table is downloaded from the
bulk download facility. If also filters
or date_filter
is defined then the SDMX REST API is
used. In case after filtering the dataset has more rows than the limitation of the SDMX REST API (1 million values at one time) then the bulk download is used to retrieve the whole dataset .
By default all datasets cached as they are often rather large.
The datasets cached in memory (default) or can be stored in a temporary directory if cache_dir
or option(restatpi_cache_dir)
is defined.
The cache can be emptied with clean_restatapi_cache
.
The id
, is a value from the code
column of the table of contents (get_eurostat_toc
), and can be searched
for with the search_eurostat_toc
function. The id value can be retrieved from the Eurostat database
as well. The Eurostat database gives codes in the Data Navigation Tree after every dataset in parenthesis.
Filtering can be done by the codes as described in the API documentation providing in the correct order and connecting with "." and "+".
If we do not know the codes we can filter based on words or by the mix of the two putting in a vector like c("AT$","Belgium","persons","Total")
.
Be careful that the filter is case sensitive, if you do not know the code or label exactly you can use the option ignore.case=TRUE
and exact_match=FALSE
,
but in this case the results may include unwanted elements as well. In the filters
parameter regular expressions can be used as well.
We do not have to worry about the correct order of the filter, it will be put in the correct place based on the DSD.
The date_filter
shall be a string in the format yyyy[-mm][-dd]. The month and the day part is optional, but if we use the years and we have monthly frequency then all the data for the given year is retrieved.
The string can be extended by adding the "<" or ">" to the beginning or to the end of the string. In this case the date filter is treated as range, and the date is used as a starting or end date. The data will include the observation of the start/end date.
A single date range can be defined as well by concatenating two dates with the ":", e.g. "2016-08:2017-03-15"
. As seen in the example the dates can have different length: one defined only at year/month level, the other by day level.
If a date range is defined with ":", it is not possible to use the "<" or ">" characters in the date filter.
If there are multiple dates which is not a continuous range, it can be put in vector in any order like c("2016-08",2013:2015,"2017-07-01")
. In this case, as well, it is not possible to use the "<" or ">" characters.
Value
a data.table with the following columns:
freq | A column for the frequency of the data in case there are multiple frequencies, for single frequency this columns is dropped from the data table |
dimension names | One column for each dimension in the data |
time | A column for the time dimension |
values | A column for numerical values |
flags | A column for flags if the keep_flags=TRUE or cflags=TRUE otherwise this column
is not included in the data table
|
The data.table does not include all missing values. The missing values are dropped if the value and flag are missing on a particular time.
In case the provided filters
can be found in the DSD, then it is used to query the API or applied locally. If the applied filters
with combination of date_filter
and select_freq
has no observation in the data set then the fucntion returns the data.table with 0 row.
In case none of the provided filters
, date_filter
or select_freq
can be parsed or found in the DSD then the whole dataset downloaded through the bulk download with a warning message.
In case the id
is not exist then the function returns the value NULL
.
See Also
search_eurostat_toc
, search_eurostat_dsd
, get_eurostat_bulk
Examples
load_cfg()
eu<-get("cc",envir=restatapi::.restatapi_env)
if (!(grepl("amzn|-aws|-azure ",Sys.info()['release']))) options(timeout=2)
head(get_eurostat_data("NAMA_10_GDP"))
head(get_eurostat_data("htec_cis3",update_cache=TRUE,check_toc=TRUE,verbose=TRUE))
head(get_eurostat_data("agr_r_milkpr",cache_dir="/tmp",cflags=TRUE))
options(restatapi_update=FALSE)
options(restatapi_cache_dir=file.path(tempdir(),"restatapi"))
head(get_eurostat_data("avia_gonc",select_freq="A",cache=FALSE))
head(get_eurostat_data("agr_r_milkpr",date_filter=2008,keep_flags=TRUE))
dt<-get_eurostat_data("avia_par_me",
filters="BE$",
exact_match=FALSE,
date_filter=c(2016,"2017-03","2017-07-01"),
select_freq="Q",
label=TRUE,
name=FALSE)
dt<-get_eurostat_data("agr_r_milkpr",
filters=c("BE$","Ungarn"),
lang="de",
date_filter="2007-06<",
keep_flags=TRUE)
dt<-get_eurostat_data("nama_10_a10_e",
filters=c("Annual","EU28","Belgium","AT","Total","EMP_DC","person"),
date_filter=c("2008",2002,2013:2018))
dt<-get_eurostat_data("vit_t3",
filters=c("EU28",eu$EA15,"HU$"),
date_filter=c("2015",2007))
dt<-get_eurostat_data("avia_par_me",
filters="Q...ME_LYPG_HU_LHBP+ME_LYTV_UA_UKKK",
date_filter=c("2016-08","2017-07-01"),
select_freq="M")
dt<-get_eurostat_data("htec_cis3",
filters="lu",
ignore.case=TRUE)
dt<-get_eurostat_data("bop_its6_det",
filters=list(bop_item="SC",
currency="MIO_EUR",
partner="EXT_EU28",
geo=c("EU28","HU"),
stk_flow="BAL",
time="2015:2020"),
date_filter="2010:2012",
select_freq="A",
label=TRUE,
name=FALSE)
clean_restatapi_cache("/tmp",verbose=TRUE)
options(timeout=60)