read_resource {frictionless} | R Documentation |
Read data from a Data Resource into a tibble data frame
Description
Reads data from a Data Resource (in a Data
Package) into a tibble (a Tidyverse data frame).
The resource must be a Tabular Data Resource.
The function uses readr::read_delim()
to read CSV files, passing the
resource properties path
, CSV dialect, column names, data types, etc.
Column names are taken from the provided Table Schema (schema
), not from
the header in the CSV file(s).
Usage
read_resource(package, resource_name, col_select = NULL)
Arguments
package |
Data Package object, created with |
resource_name |
Name of the Data Resource. |
col_select |
Character vector of the columns to include in the result, in the order provided. Selecting columns can improve read speed. |
Value
tibble()
data frame with the Data Resource's tabular data.
If there are parsing problems, a warning will alert you.
You can retrieve the full details by calling problems()
on your data
frame.
Resource properties
The Data Resource properties are handled as follows:
Path
path
is
required.
It can be a local path or URL, which must resolve.
Absolute path (/
) and relative parent path (../
) are forbidden to avoid
security vulnerabilities.
When multiple paths are provided ("path": [ "myfile1.csv", "myfile2.csv"]
)
then data are merged into a single data frame, in the order in which the
paths are listed.
Data
If path
is not present, the function will attempt to read data from the
data
property.
schema
will be ignored.
Name
name
is required.
It is used to find the resource with name
= resource_name
.
Profile
profile
is
required
to have the value tabular-data-resource
.
File encoding
encoding
(e.g. windows-1252
) is
required
if the resource file(s) is not encoded as UTF-8.
The returned data frame will always be UTF-8.
CSV Dialect
dialect
properties are
required if
the resource file(s) deviate from the default CSV settings (see below).
It can either be a JSON object or a path or URL referencing a JSON object.
Only deviating properties need to be specified, e.g. a tab delimited file
without a header row needs:
"dialect": {"delimiter": "\t", "header": "false"}
These are the CSV dialect properties. Some are ignored by the function:
-
delimiter
: default,
. -
lineTerminator
: ignored, line terminator charactersLF
andCRLF
are interpreted automatically byreadr::read_delim()
, whileCR
(used by Classic Mac OS, final release 2001) is not supported. -
doubleQuote
: defaulttrue
. -
quoteChar
: default"
. -
escapeChar
: anything but\
is ignored and it will setdoubleQuote
tofalse
as these fields are mutually exclusive. You can thus not escape with\"
and""
in the same file. -
nullSequence
: ignored, usemissingValues
. -
skipInitialSpace
: defaultfalse
. -
header
: defaulttrue
. -
commentChar
: not set by default. -
caseSensitiveHeader
: ignored, header is not used for column names, see Schema. -
csvddfVersion
: ignored.
File compression
Resource file(s) with path
ending in .gz
, .bz2
, .xz
, or .zip
are
automatically decompressed using default readr::read_delim()
functionality.
Only .gz
files can be read directly from URL path
s.
Only the extension in path
can be used to indicate compression type,
the compression
property is
ignored.
Ignored resource properties
-
title
-
description
-
format
-
mediatype
-
bytes
-
hash
-
sources
-
licenses
Table schema properties
schema
is required and must follow the Table Schema specification.
It can either be a JSON object or a path or URL referencing a JSON object.
Field
name
s are used as column headers.Field
type
s are use as column types (see further).-
missingValues
are used to interpret asNA
, with""
as default.
Field types
Field type
is used to set the column type, as follows:
-
string as
character
; orfactor
whenenum
is present.format
is ignored. -
number as
double
; orfactor
whenenum
is present. UsebareNumber: false
to ignore whitespace and non-numeric characters.decimalChar
(.
by default) andgroupChar
(undefined by default) can be defined, but the most occurring value will be used as a global value for all number fields of that resource. -
integer as
double
(not integer, to avoid issues with big numbers); orfactor
whenenum
is present. UsebareNumber: false
to ignore whitespace and non-numeric characters. -
boolean as
logical
. Non-defaulttrueValues/falseValues
are not supported. -
object as
character
. -
array as
character
. -
date as
date
. Supportsformat
, with valuesdefault
(ISO date),any
(guessymd
) and Python/C strptime patterns, such as%a, %d %B %Y
forSat, 23 November 2013
.%x
is%m/%d/%y
.%j
,%U
,%w
and%W
are not supported. -
time as
hms::hms()
. Supportsformat
, with valuesdefault
(ISO time),any
(guesshms
) and Python/C strptime patterns, such as%I%p%M:%S.%f%z
for8AM30:00.300+0200
. -
datetime as
POSIXct
. Supportsformat
, with valuesdefault
(ISO datetime),any
(ISO datetime) and the same patterns as fordate
andtime
.%c
is not supported. -
year as
date
, with01
for month and day. -
yearmonth as
date
, with01
for day. -
duration as
character
. Can be parsed afterwards withlubridate::duration()
. -
geopoint as
character
. -
geojson as
character
. -
any as
character
. Any other value is not allowed.
Type is guessed if not provided.
See Also
Other read functions:
read_package()
,
resources()
Examples
# Read a datapackage.json file
package <- read_package(
system.file("extdata", "datapackage.json", package = "frictionless")
)
package
# Read data from the resource "observations"
read_resource(package, "observations")
# The above tibble is merged from 2 files listed in the resource path
package$resources[[2]]$path
# The column names and types are derived from the resource schema
purrr::map_chr(package$resources[[2]]$schema$fields, "name")
purrr::map_chr(package$resources[[2]]$schema$fields, "type")
# Read data from the resource "deployments" with column selection
read_resource(package, "deployments", col_select = c("latitude", "longitude"))