Releases: IBM/spark-s3-shuffle
Releases · IBM/spark-s3-shuffle
Integrate build for Spark 3.2.3
This build integrates pre-built binaries for Spark 3.2.3.
Bug fixes and performance improvements
Configuration changes:
- Renamed shuffle manager to from
org.apache.spark.shuffle.S3ShuffleManager
toorg.apache.spark.shuffle.sort.S3ShuffleManager
.
Bugfixes:
- Fixed issue in serialized shuffle which prevent TeraSort to work properly.
- Fixed off-by-one error in by-pass shuffle-block-manager (thanks to @fhalde).
Improvements:
- This plugin now relies on
S3ShuffleDataIO
to write the shuffle output to the target location. - Created the optional interface
createSingleFileMapOutputWriter
for theS3ShuffleDataIO
component. This improved performance when Spark is able to write a shuffle-file without spilling. - Removed
S3SortShuffleWriter
andS3BypassMergeSortShuffleWriter
since these classes replicated already existing features.
Deprecated options
spark.shuffle.s3.forceBypassMergeSort
spark.shuffle.s3.allowSerializedShuffle
spark.shuffle.s3.sort.cloneRecords
CI:
- Use travis to build releases.
- Added Spark 3.3.2 as a build target.
Release version 0.5
Changes from the initial open-source release:
- Refactored the scala classes from the
sort
into theshuffle
package:SparkS3Shuffle
needs to be activated with--conf spark.shuffle.manager="org.apache.spark.shuffle.S3ShuffleManager" --conf spark.shuffle.sort.io.plugin.class="org.apache.spark.shuffle.S3ShuffleDataIO"
- Creation of an
io
plugin class so that we are able to leverage more of the Spark Shuffle infrastructure - Integration of SerializedShuffle (can be disabled with
spark.shuffle.s3.allowSerializedShuffle
) - Added flag
spark.shuffle.s3.sort.cloneRecords
which copiesArray[_]
Key/value pairs inSortShuffle
before insertion into the SparkExternalSorter
. - Migrated from maven to sbt.
- Github Actions creates builds for Spark 3.2.0, 3.2.1 and 3.3.0 for both Scala 2.12 and 2.13.
- Added additional tests for Sort Shuffle.
Initial release with SortShuffle fix.
v0.4 CI: Fix version in build. (#5)