[core][format] Optimize manifest reading performance，add pushdown for manifest and orc.【此pr并未完全拆分并合，并非我有意愿关闭的，特重新打开】。 #4497

ranxianglei · 2024-11-11T11:01:11Z

Purpose

English：Optimize the manifest reading performance, optimize the format object creation performance, and reduce the total time spent on the actual test manifest to less than 3ms (of course there is room for optimization to reduce it to less than 1ms). With the orc push-down function turned on, the metadata format is changed to orc, which can handle high-concurrency (qps greater than 10,000) and low-latency (overall rt less than 50ms) scenarios.

Chinese：优化manifest读取性能，优化format对象创建性能，实际测试manifest总耗时降低到3ms以下（当然还有优化空间降低到1ms以下）。配合元数据缓存开启，orc下推功能开启，元数据格式改成orc，可以承接高并发（qps 大于1万）低延迟（整体rt 50ms以下）场景

Linked issue: close #xxx

Tests

API and Format

Documentation

…ure pushdown works

…on vectors would not work

… can be saw.

ranxianglei · 2024-11-11T11:02:22Z

with #4231 together

…anifest

paimon-common/src/main/java/org/apache/paimon/format/FileFormat.java

…ub tests, may machine not diffrence

paimon-common/src/main/java/org/apache/paimon/fs/ObjectCacheManager.java

paimon-core/src/main/java/org/apache/paimon/manifest/ManifestFile.java

paimon-core/src/main/java/org/apache/paimon/operation/AbstractFileStoreScan.java

paimon-core/src/main/java/org/apache/paimon/table/source/InnerTableScan.java

ranxianglei · 2024-11-18T07:00:56Z

Note: Since the cache code related to manifest and fileformat has been withdrawn in this PR and will be submitted to the next PR, this PR cannot yet achieve the performance introduced by Purpose.

注意：由于manifest和fileformat相关的cache代码本pr已经撤回，留到下一个pr提交，本pr尚不能达到Purpose介绍的性能。

JingsongLi · 2024-11-25T02:56:06Z

paimon-common/src/main/java/org/apache/paimon/format/FileFormat.java

-
-        return Optional.empty();
+        FileFormatFactory fileFormatFactory =
+                FactoryUtil.discoverFactory(


Can you just create a PR for FileFormatFactory?

@JingsongLi Of course you can, but I’ll change it in a few days. I’ve been a little busy lately.

当然可以，不过过几天再改，最近有点忙

Aitozi · 2024-11-28T08:04:01Z

paimon-core/src/main/java/org/apache/paimon/operation/AbstractFileStoreScan.java

+     * Read the corresponding entries based on the current required bucket, but push down into file
+     * format .
+     */
+    private static List<Predicate> createPushDownFilter(Collection<Integer> buckets) {


Does the query performance mainly gain from the bucket field push down for the ORC manifest file?

More than half of the performance improvement comes from the orc pushdown of the manifest, the other part comes from the optimization of OrcFileFormat creation, and the other part comes from the caching of some time-consuming object operations on Scan.

性能提升一多半来自于manifest的orc下推，另外一部分来自于OrcFileFormat创建的优化，还有一部分来自于Scan上部分耗时的对象操作缓存 @Aitozi

with #4231 together， bucket data with orc pushdown . Tests see this issue #4586 , current orc impl is faster more than Parquet 10 times! . @Aitozi

Aitozi · 2024-11-28T08:20:01Z

paimon-core/src/main/java/org/apache/paimon/manifest/ManifestFile.java

                    entryType,
-                    fileFormat.createReaderFactory(entryType),
+                    fileFormat.createReaderFactory(entryType, filters),


If we enable the reader filter and the manifest cache, will we miss data from other buckets when reading data from bucket-x? Previously, data was stored in ObjectCache after passing through the loadFilter, but now it must pass through this filter first.

If ObjectCache is enabled and push-down withBuckets is used, the problem you mentioned may indeed occur. So I originally planned to add a Filter condition to ObjectCache, but it was too complicated to change and I didn't have so much time to do these things, so I could only push down withBuckets for the time being. Because, in most scenarios, there will be no problem. If it is in flink or spark, I have seen that withBuckets will not be called at all. If it is an olap query and the corresponding bucket is read in segments, the bucket and segment will remain mapped. There will be no problems with the relationship.
If it were not based on this consideration, I suggest that the partition should also be pushed down.
If you feel the risk is too great, you can even turn off the manifest's metadata cache, and the performance will still improve significantly. @Aitozi

如果开启了ObjectCache缓存，有使用了withBuckets的下推，确实可能出现你说的问题。所以我本来打算给ObjectCache增加一个Filter条件，但是改起来太复杂而我没有那么多时间做这些东西，只能暂时先把withBuckets下推做了。因为，大部分场景下都不会出现问题，如果是flink里面或者spark里面，我看了根本就不会调用withBuckets，如果是olap查询，分segment读取对应的bucket，则bucket和segment会保持映射关系，也不会出现问题。
如果不是基于这个考虑，我建议分区也应该下推。
如果你觉得风险太大，甚至可以关闭manifest的元数据缓存，性能依然提升很明显。

Thanks for your explanation, If we can not handle the push down when the cache enabled, I think we can disable the filter push down when the cache is enabled.

It is recommended to choose the latter between metadata caching and manifest pushdown. The performance of paimon's ObjectCache implementation is very low. After testing, sometimes it is not even as fast as manifest pushdown. I will submit a PR later to fix the performance problem of ObjectCache.

在元数据缓存和manifest下推之间建议选择后者。paimon的ObjectCache实现的性能非常低，经测试有时候甚至比不上manifest下推快。后面我会提交一个pr修复ObjectCache的性能问题。

@Aitozi

@Aitozi This is a scenario that is quite different from mainstream applications in the community. The author's internal analysis engine does not have the ability of a central node, and can only plan by each computing node themselves. Each computing node only cares about its own bucket.

Actually, this is more like a manifest cache in the writer node than the current design.

@JingsongLi In the writer node, it could still may need to read more than one bucket entry from the manifest if the parallelism is lower than the bucket number

@Aitozi It is true, there are problems in this PR's implementation.

great! Read more than 2G of metadata at one time

Aitozi · 2024-11-28T08:23:41Z

@ranxianglei thanks for your work, happy to see some effort to improve the manifest file reading performance, left two comments.

JingsongLi · 2024-12-01T11:45:53Z

Hi @ranxianglei
You can create multiple PRs to complete multiple optimizations, but currently there are still various changes mixed together, and each change requires a lot of discussion, performance testing, and evaluation of previous behavior changes.

The purpose of PR and review is not to achieve great accomplishments within a single PR, but to provide higher quality code and better architecture.

…4608)

ranxianglei · 2024-12-03T09:49:56Z

你好@ranxianglei 你可以创建多个 PR 来完成多项优化，但目前仍有各种更改混杂在一起，每个更改都需要大量的讨论、性能测试和对以前的行为更改的评估。

PR和review的目的不是为了在一次PR内取得多大的成就，而是为了提供更高质量的代码和更好的架构。

I've been quite busy lately. I'll split the PR when I'm done. @JingsongLi

JingsongLi · 2024-12-30T09:40:50Z

Related PR: #4716
Related PR: #4782

ranxianglei · 2025-01-07T09:32:12Z

此pr并未完全拆分并合，并非我有意愿关闭的，特重新打开。

This PR has not been completely split and merged. I did not intend to close it, so I am reopening it.

JingsongLi · 2025-01-08T07:43:31Z

此pr并未完全拆分并合，并非我有意愿关闭的，特重新打开。

This PR has not been completely split and merged. I did not intend to close it, so I am reopening it.

@ranxianglei

This PR needs to be split, so it is completely reasonable to close it.

And please do not use Chinese, this is the Apache community, please use English. Commissars have the right not to review your PR. You have repeatedly used Chinese, which has seriously violated the community's regulations.

ranxianglei.rxl and others added 10 commits August 21, 2024 14:51

[core] fix hll class not found

9873457

[format][orc] open orc switch useSelected,allowSARGToFilter to make s…

dedcaa4

…ure pushdown works

Merge branch 'master' of github.com:ranxianglei/paimon

4794d5f

[format][orc] miss tolerateMissingSchema

462edc6

[format][orc] fix orc selected close for no filter condition

2226fb9

[orc] keep useSelected and allowSARGToFilter close default, or deleti…

ee915c8

…on vectors would not work

[format][orc] VectorizedRowBatch to OrcColumnVector for selected rows…

2920fc9

… can be saw.

[format][orc] remove all isRepeating

8a89649

[core][format] merge with aa16c2bf1

c745e55

[core][format] merge conflicts

7841f25

ranxianglei.rxl and others added 4 commits November 11, 2024 19:11

[core] fix AuditLogTable merge error

bbdd316

[format] recover HadoopFileIO

10ef09c

[format] checkstyle

a2acbab

[format][orc] add pushdown option only for reader .

e1c90c7

ranxianglei mentioned this pull request Nov 12, 2024

[format][orc] open orc switch useSelected,allowSARGToFilter to make sure pushdown works #4231

Merged

ranxianglei changed the title ~~[core][format] Optimize manifest reading performance，add pushdown for manifest .~~ [core][format] Optimize manifest reading performance，add pushdown for manifest and orc. Nov 12, 2024

ranxianglei and others added 7 commits November 12, 2024 15:48

Merge branch 'master' into op_manifest

4364ac1

[core] recover bucket

8c9a75c

Merge branch 'op_manifest' of github.com:ranxianglei/paimon into op_m…

dfaeac3

…anifest

[core][format] add test for withBuckets and orcFormat

f71d658

[format] fix checkstyle

efac5b6

[format] fix version caused error

15b1910

[core] fix checkstyle

e1b3406

ranxianglei closed this Nov 13, 2024

ranxianglei reopened this Nov 13, 2024

JingsongLi reviewed Nov 13, 2024

View reviewed changes

paimon-common/src/main/java/org/apache/paimon/format/FileFormat.java Outdated Show resolved Hide resolved

ranxianglei added 3 commits November 13, 2024 14:39

[format] add FormatPerformanceTest

d48fff6

[format][tests] FormatPerformanceTest change to 10 times

a0efae2

[format][tests] FormatPerformanceTest change to lessthan to pass gith…

016620c

…ub tests, may machine not diffrence

ranxianglei and others added 3 commits November 14, 2024 15:00

[format] id to lowercase

282a2c9

[tests] core org.apache.paimon.factories.Factory

884d12f

[tests] fileFormat factories add to paimon-flink-common

a1cc9f4

wwj6591812 reviewed Nov 14, 2024

View reviewed changes

ranxianglei added 2 commits November 15, 2024 15:23

[core] resolve withBuckets commit

e401844

[format] no need call rowMapper under getArray

710af06

ranxianglei closed this Nov 15, 2024

ranxianglei reopened this Nov 15, 2024

[core] cancel manifest format factory cache for while .

669dc30

ranxianglei closed this Nov 18, 2024

ranxianglei reopened this Nov 18, 2024

JingsongLi reviewed Nov 25, 2024

View reviewed changes

Aitozi reviewed Nov 28, 2024

View reviewed changes

Aiden-Dong added 2 commits December 3, 2024 17:06

[core] Optimization of Parquet Predicate Pushdown Capability (apache#…

2308475

…4608)

merge

db912e4

zhangyazhe mentioned this pull request Dec 16, 2024

[orc] Optimize configuration creating in orc file format #4716

Merged

merge conflicts

c7a3776

ranxianglei force-pushed the op_manifest branch from 00db1f6 to c7a3776 Compare January 2, 2025 03:52

This was referenced Jan 2, 2025

[core][format] op format factory mv to cache . #4813

Open

[core] manifest with buckets pushdown to reduce manifest reads #4814

Open

JingsongLi closed this Jan 7, 2025

ranxianglei mentioned this pull request Jan 8, 2025

[core] CoreOptions.fileFormat is cpu expensive, because FileFormat initalization is costly. As much as we can, reduce the rate. #4782

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][format] Optimize manifest reading performance，add pushdown for manifest and orc.【此pr并未完全拆分并合，并非我有意愿关闭的，特重新打开】。 #4497

[core][format] Optimize manifest reading performance，add pushdown for manifest and orc.【此pr并未完全拆分并合，并非我有意愿关闭的，特重新打开】。 #4497

ranxianglei commented Nov 11, 2024 •

edited

Loading

ranxianglei commented Nov 11, 2024

ranxianglei commented Nov 18, 2024

JingsongLi Nov 25, 2024

ranxianglei Nov 28, 2024

Aitozi Nov 28, 2024

ranxianglei Nov 28, 2024

ranxianglei Nov 28, 2024

Aitozi Nov 28, 2024 •

edited

Loading

ranxianglei Nov 28, 2024

Aitozi Nov 28, 2024

ranxianglei Nov 29, 2024

JingsongLi Dec 1, 2024

Aitozi Dec 2, 2024 •

edited

Loading

JingsongLi Dec 2, 2024

ranxianglei Dec 17, 2024

Aitozi commented Nov 28, 2024

JingsongLi commented Dec 1, 2024

ranxianglei commented Dec 3, 2024

JingsongLi commented Dec 30, 2024

ranxianglei commented Jan 7, 2025

JingsongLi commented Jan 8, 2025

[core][format] Optimize manifest reading performance，add pushdown for manifest and orc.【此pr并未完全拆分并合，并非我有意愿关闭的，特重新打开】。 #4497

[core][format] Optimize manifest reading performance，add pushdown for manifest and orc.【此pr并未完全拆分并合，并非我有意愿关闭的，特重新打开】。 #4497

Conversation

ranxianglei commented Nov 11, 2024 • edited Loading

Purpose

Tests

API and Format

Documentation

ranxianglei commented Nov 11, 2024

ranxianglei commented Nov 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aitozi Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aitozi Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aitozi commented Nov 28, 2024

JingsongLi commented Dec 1, 2024

ranxianglei commented Dec 3, 2024

JingsongLi commented Dec 30, 2024

ranxianglei commented Jan 7, 2025

JingsongLi commented Jan 8, 2025

ranxianglei commented Nov 11, 2024 •

edited

Loading

Aitozi Nov 28, 2024 •

edited

Loading

Aitozi Dec 2, 2024 •

edited

Loading