create_tabular_dataset_from_parquet_files {azuremlsdk}R Documentation

Create an unregistered, in-memory Dataset from parquet files.

Description

Create an unregistered, in-memory Dataset from parquet files.

Usage

create_tabular_dataset_from_parquet_files(
  path,
  validate = TRUE,
  include_path = FALSE,
  set_column_types = NULL,
  partition_format = NULL
)

Arguments

path

A data path in a registered datastore or a local path.

validate

Boolean to validate if data can be loaded from the returned dataset. Defaults to True. Validation requires that the data source is accessible from the current compute.

include_path

Whether to include a column containing the path of the file from which the data was read. This is useful when you are reading multiple files, and want to know which file a particular record originated from, or to keep useful information in file path.

set_column_types

A named list to set column data type, where key is column name and value is data type.

partition_format

Specify the partition format in path and create string columns from format 'x' and datetime column from format 'x:yyyy/MM/dd/HH/mm/ss', where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extrat year, month, day, hour, minute and second for the datetime type. The format should start from the postition of first partition key until the end of file path. For example, given a file path '../USA/2019/01/01/data.csv' and data is partitioned by country and time, we can define '/Country/PartitionDate:yyyy/MM/dd/data.csv' to create columns 'Country' of string type and 'PartitionDate' of datetime type.

Value

The Tabular Dataset object.

See Also

data_path


[Package azuremlsdk version 1.10.0 Index]