Release version 0.5
Changes from the initial open-source release:
- Refactored the scala classes from the
sort
into theshuffle
package:SparkS3Shuffle
needs to be activated with--conf spark.shuffle.manager="org.apache.spark.shuffle.S3ShuffleManager" --conf spark.shuffle.sort.io.plugin.class="org.apache.spark.shuffle.S3ShuffleDataIO"
- Creation of an
io
plugin class so that we are able to leverage more of the Spark Shuffle infrastructure - Integration of SerializedShuffle (can be disabled with
spark.shuffle.s3.allowSerializedShuffle
) - Added flag
spark.shuffle.s3.sort.cloneRecords
which copiesArray[_]
Key/value pairs inSortShuffle
before insertion into the SparkExternalSorter
. - Migrated from maven to sbt.
- Github Actions creates builds for Spark 3.2.0, 3.2.1 and 3.3.0 for both Scala 2.12 and 2.13.
- Added additional tests for Sort Shuffle.