HF_load_dataset {fastai} | R Documentation |
Load_dataset
Description
Load a dataset
Usage
HF_load_dataset(
path,
name = NULL,
data_dir = NULL,
data_files = NULL,
split = NULL,
cache_dir = NULL,
features = NULL,
download_config = NULL,
download_mode = NULL,
ignore_verifications = FALSE,
save_infos = FALSE,
script_version = NULL,
...
)
Arguments
path |
path |
name |
name |
data_dir |
dataset dir |
data_files |
dataset files |
split |
split |
cache_dir |
cache directory |
features |
features |
download_config |
download configuration |
download_mode |
download mode |
ignore_verifications |
ignore verifications or not |
save_infos |
save information or not |
script_version |
script version |
... |
additional arguments |
Details
This method does the following under the hood: 1. Download and import in the library the dataset loading script from “path“ if it's not already cached inside the library. Processing scripts are small python scripts that define the citation, info and format of the dataset, contain the URL to the original data files and the code to load examples from the original data files. You can find some of the scripts here: https://github.com/huggingface/datasets/datasets and easily upload yours to share them using the CLI “datasets-cli“. 2. Run the dataset loading script which will: * Download the dataset file from the original URL (see the script) if it's not already downloaded and cached. * Process and cache the dataset in typed Arrow tables for caching. Arrow table are arbitrarily long, typed tables which can store nested objects and be mapped to numpy/pandas/python standard types. They can be directly access from drive, loaded in RAM or even streamed over the web. 3. Return a dataset build from the requested splits in “split“ (default: all).
Value
data frame