Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] Shuffle writer connects to CH pipeline #6723

Merged
merged 11 commits into from
Sep 11, 2024

Conversation

liuneng1994
Copy link
Contributor

@liuneng1994 liuneng1994 commented Aug 6, 2024

What changes were proposed in this pull request?

shuffle writer现在可以作为Processor接入ClickHouse pipeline当中。
image
在fallback模式下,会在jni中以循环的形式完成stage的执行,主要原因是,某些情况下的fallback会有spark的whole code gen,
其中code gen生成的代码会使用TaskContext,需要保证执行线程为task线程
image

移除了CachedShuffleWriter,新的SparkExchangeSink行为与原有shuffleWriter保持一致

同时还做了一下改动:

  • 支持native的inputFileName,InputBlockStart, InputBlockLength
  • shuffle Wall time统计,在Processor层面统计完整的shuffle耗时
  • LocalExecutor移出SerilizedPlanParser
  • DefaultHashAggregateResultStep与DefaultHashAggregateResultTransform的output header不匹配问题

How was this patch tested?

unit tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

github-actions bot commented Aug 6, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Aug 6, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Aug 6, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Aug 6, 2024

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@github-actions github-actions bot added the CORE works for Gluten Core label Aug 23, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@github-actions github-actions bot removed the CORE works for Gluten Core label Sep 9, 2024
Copy link

github-actions bot commented Sep 9, 2024

Run Gluten Clickhouse CI

@github-actions github-actions bot added the CORE works for Gluten Core label Sep 9, 2024
Copy link

github-actions bot commented Sep 9, 2024

Run Gluten Clickhouse CI

@github-actions github-actions bot removed the CORE works for Gluten Core label Sep 10, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

auto * current_executor = local_engine::LocalExecutor::getCurrentExecutor();
chassert(current_executor);
local_engine::SplitterHolder * splitter = nullptr;
// handle fallback, whole stage fallback or partial fallback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases, current_executor will be nullptr or not? It's better to add a comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@lgbo-ustc
Copy link
Contributor

LGTM

@liuneng1994 liuneng1994 merged commit 5f65501 into apache:main Sep 11, 2024
45 checks passed
baibaichen added a commit to Kyligence/gluten that referenced this pull request Sep 18, 2024
baibaichen added a commit to Kyligence/gluten that referenced this pull request Sep 18, 2024
baibaichen added a commit that referenced this pull request Sep 18, 2024
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240918)

* Fix build due to ENABLE_ROCKSDB=OFF caused by #7239
* Fix UT build due to #6723
* Fix UT build due to #7193
* Fix Build due to ClickHouse/ClickHouse#69298

---------

Co-authored-by: kyligence-git <[email protected]>
Co-authored-by: Chang Chen <[email protected]>
sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
What changes were proposed in this pull request?
shuffle writer现在可以作为Processor接入ClickHouse pipeline当中。
image
在fallback模式下,会在jni中以循环的形式完成stage的执行,主要原因是,某些情况下的fallback会有spark的whole code gen,
其中code gen生成的代码会使用TaskContext,需要保证执行线程为task线程
image

移除了CachedShuffleWriter,新的SparkExchangeSink行为与原有shuffleWriter保持一致

同时还做了一下改动:

支持native的inputFileName,InputBlockStart, InputBlockLength
shuffle Wall time统计,在Processor层面统计完整的shuffle耗时
LocalExecutor移出SerilizedPlanParser
DefaultHashAggregateResultStep与DefaultHashAggregateResultTransform的output header不匹配问题
How was this patch tested?
unit tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
)

* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240918)

* Fix build due to ENABLE_ROCKSDB=OFF caused by apache#7239
* Fix UT build due to apache#6723
* Fix UT build due to apache#7193
* Fix Build due to ClickHouse/ClickHouse#69298

---------

Co-authored-by: kyligence-git <[email protected]>
Co-authored-by: Chang Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants