text_dataset_from_directory {keras3} | R Documentation |
Generates a tf.data.Dataset
from text files in a directory.
Description
If your directory structure is:
main_directory/ ...class_a/ ......a_text_1.txt ......a_text_2.txt ...class_b/ ......b_text_1.txt ......b_text_2.txt
Then calling text_dataset_from_directory(main_directory, labels='inferred')
will return a tf.data.Dataset
that yields batches of
texts from the subdirectories class_a
and class_b
, together with labels
0 and 1 (0 corresponding to class_a
and 1 corresponding to class_b
).
Only .txt
files are supported at this time.
Usage
text_dataset_from_directory(
directory,
labels = "inferred",
label_mode = "int",
class_names = NULL,
batch_size = 32L,
max_length = NULL,
shuffle = TRUE,
seed = NULL,
validation_split = NULL,
subset = NULL,
follow_links = FALSE,
verbose = TRUE
)
Arguments
directory |
Directory where the data is located.
If |
labels |
Either |
label_mode |
String describing the encoding of
|
class_names |
Only valid if |
batch_size |
Size of the batches of data. Defaults to 32.
If |
max_length |
Maximum size of a text string. Texts longer than this will
be truncated to |
shuffle |
Whether to shuffle the data. Defaults to |
seed |
Optional random seed for shuffling and transformations. |
validation_split |
Optional float between 0 and 1, fraction of data to reserve for validation. |
subset |
Subset of the data to return.
One of |
follow_links |
Whether to visits subdirectories pointed to by symlinks.
Defaults to |
verbose |
Whether to display number information on classes and
number of files found. Defaults to |
Value
A tf.data.Dataset
object.
If
label_mode
isNULL
, it yieldsstring
tensors of shape(batch_size,)
, containing the contents of a batch of text files.Otherwise, it yields a tuple
(texts, labels)
, wheretexts
has shape(batch_size,)
andlabels
follows the format described below.
Rules regarding labels format:
if
label_mode
isint
, the labels are anint32
tensor of shape(batch_size,)
.if
label_mode
isbinary
, the labels are afloat32
tensor of 1s and 0s of shape(batch_size, 1)
.if
label_mode
iscategorical
, the labels are afloat32
tensor of shape(batch_size, num_classes)
, representing a one-hot encoding of the class index.
See Also
Other dataset utils:
audio_dataset_from_directory()
image_dataset_from_directory()
split_dataset()
timeseries_dataset_from_array()
Other utils:
audio_dataset_from_directory()
clear_session()
config_disable_interactive_logging()
config_disable_traceback_filtering()
config_enable_interactive_logging()
config_enable_traceback_filtering()
config_is_interactive_logging_enabled()
config_is_traceback_filtering_enabled()
get_file()
get_source_inputs()
image_array_save()
image_dataset_from_directory()
image_from_array()
image_load()
image_smart_resize()
image_to_array()
layer_feature_space()
normalize()
pad_sequences()
set_random_seed()
split_dataset()
timeseries_dataset_from_array()
to_categorical()
zip_lists()
Other preprocessing:
image_dataset_from_directory()
image_smart_resize()
timeseries_dataset_from_array()