open_delim_dataset {arrow} | R Documentation |
Open a multi-file dataset of CSV or other delimiter-separated format
Description
A wrapper around open_dataset which explicitly includes parameters mirroring read_csv_arrow()
,
read_delim_arrow()
, and read_tsv_arrow()
to allow for easy switching between functions
for opening single files and functions for opening datasets.
Usage
open_delim_dataset(
sources,
schema = NULL,
partitioning = hive_partition(),
hive_style = NA,
unify_schemas = NULL,
factory_options = list(),
delim = ",",
quote = "\"",
escape_double = TRUE,
escape_backslash = FALSE,
col_names = TRUE,
col_types = NULL,
na = c("", "NA"),
skip_empty_rows = TRUE,
skip = 0L,
convert_options = NULL,
read_options = NULL,
timestamp_parsers = NULL,
quoted_na = TRUE,
parse_options = NULL
)
open_csv_dataset(
sources,
schema = NULL,
partitioning = hive_partition(),
hive_style = NA,
unify_schemas = NULL,
factory_options = list(),
quote = "\"",
escape_double = TRUE,
escape_backslash = FALSE,
col_names = TRUE,
col_types = NULL,
na = c("", "NA"),
skip_empty_rows = TRUE,
skip = 0L,
convert_options = NULL,
read_options = NULL,
timestamp_parsers = NULL,
quoted_na = TRUE,
parse_options = NULL
)
open_tsv_dataset(
sources,
schema = NULL,
partitioning = hive_partition(),
hive_style = NA,
unify_schemas = NULL,
factory_options = list(),
quote = "\"",
escape_double = TRUE,
escape_backslash = FALSE,
col_names = TRUE,
col_types = NULL,
na = c("", "NA"),
skip_empty_rows = TRUE,
skip = 0L,
convert_options = NULL,
read_options = NULL,
timestamp_parsers = NULL,
quoted_na = TRUE,
parse_options = NULL
)
Arguments
sources |
One of:
When |
schema |
Schema for the |
partitioning |
When
The default is to autodetect Hive-style partitions unless
|
hive_style |
Logical: should |
unify_schemas |
logical: should all data fragments (files, |
factory_options |
list of optional FileSystemFactoryOptions:
|
delim |
Single character used to separate fields within a record. |
quote |
Single character used to quote strings. |
escape_double |
Does the file escape quotes by doubling them?
i.e. If this option is |
escape_backslash |
Does the file use backslashes to escape special
characters? This is more general than |
col_names |
If |
col_types |
A compact string representation of the column types,
an Arrow Schema, or |
na |
A character vector of strings to interpret as missing values. |
skip_empty_rows |
Should blank rows be ignored altogether? If
|
skip |
Number of lines to skip before reading data. |
convert_options |
|
read_options |
|
timestamp_parsers |
User-defined timestamp parsers. If more than one parser is specified, the CSV conversion logic will try parsing values starting from the beginning of this vector. Possible values are:
|
quoted_na |
Should missing values inside quotes be treated as missing
values (the default) or strings. (Note that this is different from the
the Arrow C++ default for the corresponding convert option,
|
parse_options |
see CSV parsing options.
If given, this overrides any
parsing options provided in other arguments (e.g. |
Options currently supported by read_delim_arrow()
which are not supported here
-
file
(instead, please specify files insources
) -
col_select
(instead, subset columns after dataset creation) -
as_data_frame
(instead, convert to data frame after dataset creation) -
parse_options
See Also
Examples
# Set up directory for examples
tf <- tempfile()
dir.create(tf)
df <- data.frame(x = c("1", "2", "NULL"))
file_path <- file.path(tf, "file1.txt")
write.table(df, file_path, sep = ",", row.names = FALSE)
read_csv_arrow(file_path, na = c("", "NA", "NULL"), col_names = "y", skip = 1)
open_csv_dataset(file_path, na = c("", "NA", "NULL"), col_names = "y", skip = 1)
unlink(tf)