read_gesla {geslaR} | R Documentation |
Read a GESLA dataset
Description
Read a CSV or Parquet file, as exported from the GESLA Shiny app interface (geslaR-app). A "GESLA dataset file" is a subset of the GESLA dataset, fetched from the geslaR-app. When using that app, you can choose to download the selected subset in CSV or Parquet file formats. Whichever option is chosen this function will automatically identify the file type and use the appropriate functions to import the dataset to R.
This function can be used for exported files from the online
interface (hosted in this
server)
or from a local interface, as when using the
run_gesla_app()
function.
Usage
read_gesla(file, as_data_frame = FALSE, ...)
Arguments
file |
The file name (must end in |
as_data_frame |
If |
... |
Other arguments from |
Details
We highly recommend to export subsets of the GESLA dataset from the geslaR-app in the Parquet file format. This format has a much smaller file size when comparred to the CSV format.
In any case, the only difference between CSV and Parquet files will
be the file size. However, when importing these data to R, both
file types have the option to be imported as an Arrow Table
format, which is the default (argument as_data_frame = FALSE
).
This way, the object created in R will have a very small size,
independent of how big the file size is. To deal with this type of
object, you can use dplyr
verbs, in the same way as a normal
data.frame
(or tbl_df
). Some examples can be found in the Arrow documentation.
If the as_data_frame
argument is set to TRUE
, the imported R
object will vary in size, according to the size of the dataset, and
regardless of the file type. In many situations, this can be
infeasible, since the object can result in a "larger-than-memory"
size, and possibly will make R operations slow or even a session
crash. Therefore, we always recommend to start with as_data_frame = FALSE
, and work with the dataset from there.
See Examples below.
Value
An Arrow Table
object, or a tbl_df
(data.frame
)
Author(s)
Fernando Mayer fernando.mayer@mu.ie
Examples
##------------------------------------------------------------------
## Import an internal example Parquet file
tmp <- tempdir()
file.copy(system.file(
"extdata", "ireland.parquet", package = "geslaR"), tmp)
da <- read_gesla(paste0(tmp, "/ireland.parquet"))
## Check size in memory
object.size(da)
##------------------------------------------------------------------
## Import an internal example CSV file
tmp <- tempdir()
file.copy(system.file(
"extdata", "ireland.csv", package = "geslaR"), tmp)
da <- read_gesla(paste0(tmp, "/ireland.csv"))
## Check size in memory
object.size(da)
##------------------------------------------------------------------
## Import an internal example Parquet file as data.frame
tmp <- tempdir()
file.copy(system.file(
"extdata", "ireland.parquet", package = "geslaR"), tmp)
da <- read_gesla(paste0(tmp, "/ireland.parquet"),
as_data_frame = TRUE)
## Check size in memory
object.size(da)
##------------------------------------------------------------------
## Import an internal example CSV file as data.frame
tmp <- tempdir()
file.copy(system.file(
"extdata", "ireland.csv", package = "geslaR"), tmp)
da <- read_gesla(paste0(tmp, "/ireland.csv"),
as_data_frame = TRUE)
## Check size in memory
object.size(da)
## Remove files from temporary directory
unlink(paste0(tmp, "/ireland.parquet"))
unlink(paste0(tmp, "/ireland.csv"))