Skip to content

Commit

Permalink
[MINOR][DOC] update the condition description of BypassMergeSortShuffle
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?
These three condition descriptions should be updated, follow apache#23228  :
<li>no Ordering is specified,</li>
<li>no Aggregator is specified, and</li>
<li>the number of partitions is less than
<code>spark.shuffle.sort.bypassMergeThreshold</code>.
</li>
1、If the shuffle dependency specifies aggregation, but it only aggregates at the reduce-side, BypassMergeSortShuffle can still be used.
2、If the number of output partitions is spark.shuffle.sort.bypassMergeThreshold(eg.200), we can use BypassMergeSortShuffle.

## How was this patch tested?
N/A

Closes apache#23281 from lcqzte10192193/wid-lcq-1211.

Authored-by: lichaoqun <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
  • Loading branch information
lcqzte10192193 authored and srowen committed Dec 13, 2018
1 parent f372609 commit f69998a
Showing 1 changed file with 2 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,8 @@
* simultaneously opens separate serializers and file streams for all partitions. As a result,
* {@link SortShuffleManager} only selects this write path when
* <ul>
* <li>no Ordering is specified,</li>
* <li>no Aggregator is specified, and</li>
* <li>the number of partitions is less than
* <li>no map-side combine is specified, and</li>
* <li>the number of partitions is less than or equal to
* <code>spark.shuffle.sort.bypassMergeThreshold</code>.</li>
* </ul>
*
Expand Down

0 comments on commit f69998a

Please sign in to comment.