query_gesla {geslaR} | R Documentation |
Query the GESLA dataset
Description
This function will make a query to fetch a subset of the GESLA dataset. At least a country code and one year must be specified. Site names can also be specified, but are optional. By default, the resulting subset will contain only data that were revised and recommended for analysis, by the GESLA group of researchers.
Usage
query_gesla(
country,
year = NULL,
site_name = NULL,
use_flag = 1,
as_data_frame = FALSE
)
Arguments
country |
A character vector specifying the selected countries, using the three-letter ISO 3166-1 alpha-3 code. See Details. |
year |
A numeric vector specifying the selected years. If
|
site_name |
Optional character vector of site names. |
use_flag |
The default is |
as_data_frame |
If |
Details
The country codes must follow the three-letter ISO 3166-1 alpha-3 code.
However, note that not all countries are available at the GESLA
dataset. If in doubt, please check the GESLA Shiny app interface
(geslaR-app) online in this server, or use the
run_gesla_app()
function to open the interface locally.
The use_flag
argument must be 1
or 0
, or c(0, 1)
. The
use_flag
is a column at the GESLA dataset thet indicates wehter
the data should be used for analysis or not. The 1
(default)
indicates it should, and 0
the otherwise. In a data analysis
scenario, the user must only be interested in using the recommended
data, so this argument shouldn't be changed. However, in same cases,
one must be interested in the non-recommended data, therefore this
option is available. Also, you can specify c(0, 1)
to fetch all
the data (usable and not usable). In any case, the use_flag
column
will always be present, and it can be used for any post-processing.
Please, see the GESLA format documentation
for more details.
The default argument as_data_frame = FALSE
will result in an
object of the arrow_dplyr_query
class. The advantage is that,
regardless of the size of the resulting dataset, the object will be
small in (memory) size. Also, as it happens with the Arrow Table
class, it can be manipulated with dplyr
verbs. Please, see the
documentation at the Arrow website.
Note that, if the as_data_frame
argument is set to TRUE
, the
imported R object will vary in size, according to the size of the
subset. In many situations, this can take a long time an may even be
infeasible, since the object can result in a "larger-than-memory"
size, and possibly will make R operations slow or even a session
crash. Therefore, we always recommend to start with as_data_frame = FALSE
, and work with the dataset from there.
Please, see vignette("intro-to-geslaR")
for a detailed example.
Value
An object of class arrow_dplyr_query
or a tbl_df
(data.frame
).
Author(s)
Fernando Mayer fernando.mayer@mu.ie
Examples
if(interactive()) {
## Simple query
da <- query_gesla(country = "IRL")
## Select one specific year
da <- query_gesla(country = "IRL", year = 2015)
## Multiple years
da <- query_gesla(country = "IRL", year = c(2015, 2017))
da <- query_gesla(country = "IRL", year = 2010:2017)
da <- query_gesla(country = "IRL", year = c(2010, 2012, 2015))
da |>
count(year) |>
collect()
## Multiple countries
da <- query_gesla(country = c("IRL", "ATA"), year = 2015)
da <- query_gesla(country = c("IRL", "ATA"), year = 2010:2017)
da |>
count(country, year) |>
collect()
## Specifying a site name
da <- query_gesla(country = "IRL", year = c(2015, 2017),
site_name = "Dublin_Port")
da |>
count(year) |>
collect()
}