read_resource {frictionless}R Documentation

Read data from a Data Resource into a tibble data frame

Description

Reads data from a Data Resource (in a Data Package) into a tibble (a Tidyverse data frame). The resource must be a Tabular Data Resource. The function uses readr::read_delim() to read CSV files, passing the resource properties path, CSV dialect, column names, data types, etc. Column names are taken from the provided Table Schema (schema), not from the header in the CSV file(s).

Usage

read_resource(package, resource_name, col_select = NULL)

Arguments

package

Data Package object, created with read_package() or create_package().

resource_name

Name of the Data Resource.

col_select

Character vector of the columns to include in the result, in the order provided. Selecting columns can improve read speed.

Value

tibble() data frame with the Data Resource's tabular data. If there are parsing problems, a warning will alert you. You can retrieve the full details by calling problems() on your data frame.

Resource properties

The Data Resource properties are handled as follows:

Path

path is required. It can be a local path or URL, which must resolve. Absolute path (/) and relative parent path (⁠../⁠) are forbidden to avoid security vulnerabilities.

When multiple paths are provided (⁠"path": [ "myfile1.csv", "myfile2.csv"]⁠) then data are merged into a single data frame, in the order in which the paths are listed.

Data

If path is not present, the function will attempt to read data from the data property. schema will be ignored.

Name

name is required. It is used to find the resource with name = resource_name.

Profile

profile is required to have the value tabular-data-resource.

File encoding

encoding (e.g. windows-1252) is required if the resource file(s) is not encoded as UTF-8. The returned data frame will always be UTF-8.

CSV Dialect

dialect properties are required if the resource file(s) deviate from the default CSV settings (see below). It can either be a JSON object or a path or URL referencing a JSON object. Only deviating properties need to be specified, e.g. a tab delimited file without a header row needs:

"dialect": {"delimiter": "\t", "header": "false"}

These are the CSV dialect properties. Some are ignored by the function:

File compression

Resource file(s) with path ending in .gz, .bz2, .xz, or .zip are automatically decompressed using default readr::read_delim() functionality. Only .gz files can be read directly from URL paths. Only the extension in path can be used to indicate compression type, the compression property is ignored.

Ignored resource properties

Table schema properties

schema is required and must follow the Table Schema specification. It can either be a JSON object or a path or URL referencing a JSON object.

Field types

Field type is used to set the column type, as follows:

See Also

Other read functions: read_package(), resources()

Examples

# Read a datapackage.json file
package <- read_package(
  system.file("extdata", "datapackage.json", package = "frictionless")
)

package

# Read data from the resource "observations"
read_resource(package, "observations")

# The above tibble is merged from 2 files listed in the resource path
package$resources[[2]]$path

# The column names and types are derived from the resource schema
purrr::map_chr(package$resources[[2]]$schema$fields, "name")
purrr::map_chr(package$resources[[2]]$schema$fields, "type")

# Read data from the resource "deployments" with column selection
read_resource(package, "deployments", col_select = c("latitude", "longitude"))

[Package frictionless version 1.1.0 Index]