df_from_file {duckplyr} | R Documentation |
Read Parquet, CSV, and other files using DuckDB
Description
df_from_file()
uses arbitrary table functions to read data.
See https://duckdb.org/docs/data/overview for a documentation
of the available functions and their options.
To read multiple files with the same schema,
pass a wildcard or a character vector to the path
argument,
duckplyr_df_from_file()
is a thin wrapper around df_from_file()
that calls as_duckplyr_df()
on the output.
These functions ingest data from a file using a table function. The results are transparently converted to a data frame, but the data is only read when the resulting data frame is actually accessed.
df_from_csv()
reads a CSV file using the read_csv_auto()
table function.
duckplyr_df_from_csv()
is a thin wrapper around df_from_csv()
that calls as_duckplyr_df()
on the output.
df_from_parquet()
reads a Parquet file using the read_parquet()
table function.
duckplyr_df_from_parquet()
is a thin wrapper around df_from_parquet()
that calls as_duckplyr_df()
on the output.
df_to_parquet()
writes a data frame to a Parquet file via DuckDB.
If the data frame is a duckplyr_df
, the materialization occurs outside of R.
An existing file will be overwritten.
This function requires duckdb >= 0.10.0.
Usage
df_from_file(path, table_function, ..., options = list(), class = NULL)
duckplyr_df_from_file(
path,
table_function,
...,
options = list(),
class = NULL
)
df_from_csv(path, ..., options = list(), class = NULL)
duckplyr_df_from_csv(path, ..., options = list(), class = NULL)
df_from_parquet(path, ..., options = list(), class = NULL)
duckplyr_df_from_parquet(path, ..., options = list(), class = NULL)
df_to_parquet(data, path)
Arguments
path |
Path to files, glob patterns |
table_function |
The name of a table-valued
DuckDB function such as |
... |
These dots are for future extensions and must be empty. |
options |
Arguments to the DuckDB function
indicated by |
class |
The class of the output.
By default, a tibble is created.
The returned object will always be a data frame.
Use |
data |
A data frame to be written to disk. |
Value
A data frame for df_from_file()
, or a duckplyr_df
for
duckplyr_df_from_file()
, extended by the provided class
.
Examples
# Create simple CSV file
path <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 1:3, b = letters[4:6]), path, row.names = FALSE)
# Reading is immediate
df <- df_from_csv(path)
# Materialization only upon access
names(df)
df$a
# Return as tibble, specify column types:
df_from_file(
path,
"read_csv",
options = list(delim = ",", types = list(c("DOUBLE", "VARCHAR"))),
class = class(tibble())
)
# Read multiple file at once
path2 <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 4:6, b = letters[7:9]), path2, row.names = FALSE)
duckplyr_df_from_csv(file.path(tempdir(), "duckplyr_test_*.csv"))
unlink(c(path, path2))
# Write a Parquet file:
path_parquet <- tempfile(fileext = ".parquet")
df_to_parquet(df, path_parquet)
# With a duckplyr_df, the materialization occurs outside of R:
df %>%
as_duckplyr_df() %>%
mutate(b = a + 1) %>%
df_to_parquet(path_parquet)
duckplyr_df_from_parquet(path_parquet)
unlink(path_parquet)