You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hash: hash partition both sides and shuffle to the same worker to join
Random left + Broadcast right: for non-equal join, random distribute left side and broadcast right side to join
Colocated: when left and right table are partitioned the same way, join locally without shuffling
Lookup: when right table is a dimension table, and join is equality join on right table's primary key, join locally by looking up the lookup table of the dimension table
Both 3 and 4 can avoid data shuffling, and we want to add more strategies that can reduce data shuffling:
When right table is fully replicated to all servers (similar to dimension table, but not necessary with a lookup table), we can join locally
Broadcast the right table, and join on left table local worker
When left table is partitioned and the join is on the partition key, partition the right table, and join on left table local worker
The text was updated successfully, but these errors were encountered:
Currently we support 4 JOIN algorithms:
Both 3 and 4 can avoid data shuffling, and we want to add more strategies that can reduce data shuffling:
The text was updated successfully, but these errors were encountered: