Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat](mtmv)mtmv support paimon partition refresh(#43959,#44911) #45660

Open
wants to merge 4 commits into
base: branch-3.0
Choose a base branch
from

Conversation

zddr
Copy link
Contributor

@zddr zddr commented Dec 19, 2024

pick: #44911 #43959

only pick code about paimon, not pick some code about MTMV REFRESH

zddr added 2 commits December 19, 2024 18:47
### What problem does this PR solve?
Previously, when using Paimon to create MTMV, it was not possible to
perceive changes in partition lists and data, so only `refresh
materialized view mv1 complete` could be used to force full refresh.

This PR obtains the partition list of Paimon, the last update time of
the partition, and the latest snapshotId of the table.

Therefore, MTMV can be partitioned based on Paimon tables and perceive
changes in data, automatically refreshing partitions

### Release note
mtmv support paimon partition refresh
…the latest data (apache#44911)

Problem Summary:
- add `PaimonMetadataCacheMgr` in `ExternalMetaCacheMgr` to manage
snapshotCache of paimon table
- move paimonSchemaCache to PaimonMetadataCacheMgr, and add schemaId as
part of key
- PaimonExternalTable overrides the methods in ExternalTable and
supports partition pruning
- PaimonExternalTable implements the MvcTable interface, supporting the
retrieval of snapshot data from the cache during queries to avoid cache
refreshes that may result in different versions of metadata being used
in a single query
- MTMVTask retrieves snapshot data of mvccTable before the task starts
to avoid cache refresh that may result in different versions of metadata
being used in a single refresh task

Paimon queries the data in the cache instead of querying the latest data

behavior changes of query  paimon table:
- FE has just started and is query the latest data
- Paimon data has changed, Doris is still query the previous data
- After the snapshot cache expires, Doris will query the latest data
- desc paimon; The schema corresponding to the snapshotId in the
snapshot cache is displayed
@Thearas
Copy link
Contributor

Thearas commented Dec 19, 2024

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zddr
Copy link
Contributor Author

zddr commented Dec 19, 2024

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants