R: Get data filters ids from fedstat.ru indicator web page

fedstat_get_data_ids {fedstatAPIr}

R Documentation

Get data filters ids from fedstat.ru indicator web page

Description

To query data from fedstat we need to POST some filters in form of filter numeric identificators. Most filters don't have some rule from which their ids can be generated based on filters titles and values. It seems like these ids are just indexes in the fedstat inner database. So in order to get the data, we first need to get the ids of the filter values by parsing specific part of java script source code on indicator web page.

Usage

fedstat_get_data_ids(
  indicator_id,
  ...,
  timeout_seconds = 180,
  retry_max_times = 3,
  httr_verbose = NULL
)

Arguments

`indicator_id`	character, indicator id/code from indicator URL. For example for indicator with URL https://www.fedstat.ru/indicator/37426 indicator id will be 37426
`...`	other arguments passed to httr::GET
`timeout_seconds`	numeric, maximum time before a new GET request is tried
`retry_max_times`	numeric, maximum number of tries to GET `data_ids`
`httr_verbose`	`httr::verbose()` or NULL, outputs messages to the console about the processing of the request

Details

It is known that the fedstat lags quite often. Sometimes site never responds at all. This is especially true for the most popular indicators web pages. In this regard, by default, a GET request is sent 3 times with a timeout of 180 seconds and with initially small, but growing exponentially, pauses between requests.

As a rule, requests to the indicator web page take much longer than requests to get the data itself. A POST request for data is sent to a single URL https://www.fedstat.ru/indicator/data.do?format=(excel or sdmx) for all indicators and is often quite fast. In this regard, for many indicators, it makes sense to cache data_ids to increase the speed of data download. This is not possible for all data, for example, for weekly prices, each new week adds a new filter (new week), the id of which can only be found on the indicator web page. But for most data (e.g. monthly frequency), time filters are trivial. There are 12 months in total with unique ids that do not change and year ids that match their values (that is, filter_value_id = filter_value, in other words 2020 = 2020)

Correct filter_field_object_ids are needed to get data. For the sdmx format, these ids do not change anything, except for the standard data sorting, but their incorrect specification will lead either to incomplete data loading or to no data at all. For the excel format, these ids determine the form of data presentation, as in the data preview on the fedstat site. For now only default filter_field_object_ids are used, which are parsed from java script source code on indicator web page. Users can specify filter_field_object_ids for each filter_field in resulting data_ids table.

Value

data.frame with all character type columns:

filter_field_id - id for filter field;
filter_field_title - filter field title string representation;
filter_value_id - id for filter field value;
filter_value_title - filter field value title string representation;
filter_field_object_ids - special strings that define the location of the filters fields. It can take the following values: lineObjectIds (filters in lines), columnObjectIds (filters in columns), filterObjectIds (hidden filters for all data);

Examples

## Not run: 
# Get data filters identificators for CPI
data_ids <- fedstat_get_data_ids("31074")

## End(Not run)

[Package fedstatAPIr version 1.0.3 Index]