File tree Expand file tree Collapse file tree 1 file changed +13
-1
lines changed Expand file tree Collapse file tree 1 file changed +13
-1
lines changed Original file line number Diff line number Diff line change @@ -722,7 +722,19 @@ def finalize(self) -> tuple[list[int], int]:
722
722
723
723
724
724
class NoShuffleBeamWriter :
725
- """Shuffles / writes Examples beam collection to sharded files."""
725
+ """Writes examples to sharded files using Beam in a non-deterministic way.
726
+
727
+ The number of shards and in what shard an example is written is
728
+ non-deterministic. This means that there may be a shards with few examples
729
+ and other shards with many examples.
730
+
731
+ This writer class should only be used when the ordering of the examples is not
732
+ important, e.g., when a file format that supports random access is used.
733
+
734
+ The speed of writing is faster than the Writer class because it does not need
735
+ to shuffle the examples and make sure that the examples are written to the
736
+ correct shard.
737
+ """
726
738
727
739
_OUTPUT_TAG_BUCKETS_LEN_SIZE = "tag_buckets_len_size"
728
740
You can’t perform that action at this time.
0 commit comments