Skip to content

Release version 0.5

Compare
Choose a tag to compare
@pspoerri pspoerri released this 13 Jul 15:49
· 82 commits to main since this release
3d2b56d

Changes from the initial open-source release:

  • Refactored the scala classes from the sort into the shuffle package: SparkS3Shuffle needs to be activated with
     --conf spark.shuffle.manager="org.apache.spark.shuffle.S3ShuffleManager"
     --conf spark.shuffle.sort.io.plugin.class="org.apache.spark.shuffle.S3ShuffleDataIO"
    
  • Creation of an io plugin class so that we are able to leverage more of the Spark Shuffle infrastructure
  • Integration of SerializedShuffle (can be disabled with spark.shuffle.s3.allowSerializedShuffle)
  • Added flag spark.shuffle.s3.sort.cloneRecords which copies Array[_] Key/value pairs in SortShuffle before insertion into the Spark ExternalSorter.
  • Migrated from maven to sbt.
  • Github Actions creates builds for Spark 3.2.0, 3.2.1 and 3.3.0 for both Scala 2.12 and 2.13.
  • Added additional tests for Sort Shuffle.