Dataset {arrow}R Documentation

Multi-file datasets

Description

Arrow Datasets allow you to query against data that has been split across multiple files. This sharding of data may indicate partitioning, which can accelerate queries that only touch some partitions (files).

A Dataset contains one or more Fragments, such as files, of potentially differing type and partitioning.

For Dataset$create(), see open_dataset(), which is an alias for it.

DatasetFactory is used to provide finer control over the creation of Datasets.

Factory

DatasetFactory is used to create a Dataset, inspect the Schema of the fragments contained in it, and declare a partitioning. FileSystemDatasetFactory is a subclass of DatasetFactory for discovering files in the local file system, the only currently supported file system.

For the DatasetFactory$create() factory method, see dataset_factory(), an alias for it. A DatasetFactory has:

FileSystemDatasetFactory$create() is a lower-level factory method and takes the following arguments:

Methods

A Dataset has the following methods:

FileSystemDataset has the following methods:

UnionDataset has the following methods:

See Also

open_dataset() for a simple interface to creating a Dataset


[Package arrow version 15.0.1 Index]