Skip to content

Commit 91a57fe

Browse files
tomvdwThe TensorFlow Datasets Authors
authored and
The TensorFlow Datasets Authors
committed
Correct the docstring of NoShuffleBeamWriter
PiperOrigin-RevId: 736076896
1 parent d0c47fc commit 91a57fe

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

tensorflow_datasets/core/writer.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -722,7 +722,19 @@ def finalize(self) -> tuple[list[int], int]:
722722

723723

724724
class NoShuffleBeamWriter:
725-
"""Shuffles / writes Examples beam collection to sharded files."""
725+
"""Writes examples to sharded files using Beam in a non-deterministic way.
726+
727+
The number of shards and in what shard an example is written is
728+
non-deterministic. This means that there may be a shards with few examples
729+
and other shards with many examples.
730+
731+
This writer class should only be used when the ordering of the examples is not
732+
important, e.g., when a file format that supports random access is used.
733+
734+
The speed of writing is faster than the Writer class because it does not need
735+
to shuffle the examples and make sure that the examples are written to the
736+
correct shard.
737+
"""
726738

727739
_OUTPUT_TAG_BUCKETS_LEN_SIZE = "tag_buckets_len_size"
728740

0 commit comments

Comments
 (0)