read_json_arrow {arrow} | R Documentation |
Read a JSON file
Description
Wrapper around JsonTableReader to read a newline-delimited JSON (ndjson) file into a data frame or Arrow Table.
Usage
read_json_arrow(
file,
col_select = NULL,
as_data_frame = TRUE,
schema = NULL,
...
)
Arguments
file |
A character file name or URI, connection, literal data (either a
single string or a raw vector), an Arrow input stream, or a If a file name, a memory-mapped Arrow InputStream will be opened and closed when finished; compression will be detected from the file extension and handled automatically. If an input stream is provided, it will be left open. To be recognised as literal data, the input must be wrapped with |
col_select |
A character vector of column names to keep, as in the
"select" argument to |
as_data_frame |
Should the function return a |
schema |
Schema that describes the table. |
... |
Additional options passed to |
Details
If passed a path, will detect and handle compression from the file extension
(e.g. .json.gz
).
If schema
is not provided, Arrow data types are inferred from the data:
JSON null values convert to the
null()
type, but can fall back to any other type.JSON booleans convert to
boolean()
.JSON numbers convert to
int64()
, falling back tofloat64()
if a non-integer is encountered.JSON strings of the kind "YYYY-MM-DD" and "YYYY-MM-DD hh:mm:ss" convert to
timestamp(unit = "s")
, falling back toutf8()
if a conversion error occurs.JSON arrays convert to a
list_of()
type, and inference proceeds recursively on the JSON arrays' values.Nested JSON objects convert to a
struct()
type, and inference proceeds recursively on the JSON objects' values.
When as_data_frame = TRUE
, Arrow types are further converted to R types.
Value
A tibble
, or a Table if as_data_frame = FALSE
.
Examples
tf <- tempfile()
on.exit(unlink(tf))
writeLines('
{ "hello": 3.5, "world": false, "yo": "thing" }
{ "hello": 3.25, "world": null }
{ "hello": 0.0, "world": true, "yo": null }
', tf, useBytes = TRUE)
read_json_arrow(tf)
# Read directly from strings with `I()`
read_json_arrow(I(c('{"x": 1, "y": 2}', '{"x": 3, "y": 4}')))