Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat](mtmv)mtmv support paimon partition refresh #43959

Merged
merged 29 commits into from
Nov 21, 2024

Conversation

zddr
Copy link
Contributor

@zddr zddr commented Nov 14, 2024

What problem does this PR solve?

Previously, when using Paimon to create MTMV, it was not possible to perceive changes in partition lists and data, so only refresh materialized view mv1 complete could be used to force full refresh.

This PR obtains the partition list of Paimon, the last update time of the partition, and the latest snapshotId of the table.

Therefore, MTMV can be partitioned based on Paimon tables and perceive changes in data, automatically refreshing partitions

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
mtmv support paimon partition refresh

Release note

mtmv support paimon partition refresh

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zddr
Copy link
Contributor Author

zddr commented Nov 14, 2024

run buildall

1 similar comment
@zddr
Copy link
Contributor Author

zddr commented Nov 15, 2024

run buildall

@zddr
Copy link
Contributor Author

zddr commented Nov 15, 2024

run buildall

@zddr
Copy link
Contributor Author

zddr commented Nov 18, 2024

run external

@zddr
Copy link
Contributor Author

zddr commented Nov 18, 2024

run buildall

@zddr
Copy link
Contributor Author

zddr commented Nov 18, 2024

run buildall

@zddr
Copy link
Contributor Author

zddr commented Nov 19, 2024

run buildall

@zddr
Copy link
Contributor Author

zddr commented Nov 19, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 44860 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 96d7beab7a4e66fd34f4d599e96883f7f8708b3e, data reload: false

------ Round 1 ----------------------------------
q1	17583	7817	7266	7266
q2	2221	1166	1161	1161
q3	9964	1132	1163	1132
q4	10230	779	733	733
q5	7573	2640	2631	2631
q6	242	150	146	146
q7	987	609	611	609
q8	9378	2357	2360	2357
q9	6662	6500	6411	6411
q10	7015	2337	2294	2294
q11	473	263	268	263
q12	409	209	217	209
q13	17771	2984	2992	2984
q14	240	204	203	203
q15	565	534	523	523
q16	688	585	591	585
q17	971	593	545	545
q18	7264	6760	6727	6727
q19	1318	998	972	972
q20	2901	2783	2729	2729
q21	3969	3297	3056	3056
q22	1391	1350	1324	1324
Total cold run time: 109815 ms
Total hot run time: 44860 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7258	7250	7373	7250
q2	341	239	239	239
q3	3097	3020	3008	3008
q4	2080	1867	1787	1787
q5	5585	5681	5682	5681
q6	222	137	138	137
q7	2202	1857	1823	1823
q8	3297	3511	3500	3500
q9	8850	8832	8838	8832
q10	3602	3602	3567	3567
q11	602	507	492	492
q12	824	639	614	614
q13	11204	3240	3230	3230
q14	321	271	287	271
q15	569	506	519	506
q16	676	631	629	629
q17	1872	1625	1621	1621
q18	8297	7591	7655	7591
q19	1721	1618	1436	1436
q20	2114	1885	1874	1874
q21	5567	5368	5396	5368
q22	668	550	584	550
Total cold run time: 70969 ms
Total hot run time: 60006 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.9 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 96d7beab7a4e66fd34f4d599e96883f7f8708b3e, data reload: false

query1	0.03	0.03	0.06
query2	0.07	0.03	0.03
query3	0.24	0.06	0.07
query4	1.63	0.11	0.10
query5	0.42	0.42	0.40
query6	1.18	0.66	0.65
query7	0.02	0.01	0.02
query8	0.04	0.03	0.04
query9	0.58	0.50	0.51
query10	0.55	0.55	0.56
query11	0.13	0.10	0.10
query12	0.13	0.11	0.11
query13	0.61	0.61	0.59
query14	2.71	2.71	2.79
query15	0.91	0.83	0.82
query16	0.39	0.37	0.38
query17	1.07	1.00	1.06
query18	0.19	0.21	0.20
query19	1.93	1.87	2.01
query20	0.01	0.01	0.01
query21	15.38	0.58	0.58
query22	2.82	1.85	1.83
query23	17.04	1.10	0.74
query24	2.92	1.46	0.38
query25	0.23	0.12	0.08
query26	0.53	0.13	0.13
query27	0.05	0.05	0.04
query28	10.99	1.11	1.08
query29	12.56	3.24	3.22
query30	0.25	0.06	0.06
query31	2.87	0.38	0.37
query32	3.26	0.47	0.47
query33	2.95	3.01	3.02
query34	17.23	4.48	4.49
query35	4.64	4.56	4.49
query36	0.66	0.50	0.49
query37	0.10	0.06	0.06
query38	0.04	0.03	0.03
query39	0.04	0.02	0.02
query40	0.16	0.13	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.04	0.03
Total cold run time: 107.71 s
Total hot run time: 31.9 s

@zddr
Copy link
Contributor Author

zddr commented Nov 19, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 45331 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cd30e5c44cf975127b5921eadaedbbd03a92bd9d, data reload: false

------ Round 1 ----------------------------------
q1	17584	7530	7257	7257
q2	2226	1160	1178	1160
q3	9966	1188	1119	1119
q4	10228	741	748	741
q5	7631	2809	2736	2736
q6	244	151	147	147
q7	1003	634	629	629
q8	9355	2372	2381	2372
q9	6713	6484	6455	6455
q10	6967	2284	2359	2284
q11	464	262	253	253
q12	416	205	205	205
q13	17765	3013	3078	3013
q14	242	224	209	209
q15	572	522	522	522
q16	630	585	564	564
q17	984	585	611	585
q18	7433	6844	6794	6794
q19	1341	1019	1023	1019
q20	2927	2720	2699	2699
q21	3963	3226	3245	3226
q22	1384	1342	1346	1342
Total cold run time: 110038 ms
Total hot run time: 45331 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7281	7228	7333	7228
q2	336	231	240	231
q3	3089	2933	2966	2933
q4	2070	1840	1820	1820
q5	5739	5541	5470	5470
q6	217	142	140	140
q7	2132	1747	1710	1710
q8	3269	3429	3420	3420
q9	8627	8619	8569	8569
q10	3490	3455	3437	3437
q11	605	487	490	487
q12	770	601	577	577
q13	10865	3018	3044	3018
q14	284	266	257	257
q15	569	507	493	493
q16	664	640	655	640
q17	1809	1597	1561	1561
q18	7891	7612	7610	7610
q19	1677	1533	1541	1533
q20	2078	1813	1808	1808
q21	5393	5246	5236	5236
q22	640	554	548	548
Total cold run time: 69495 ms
Total hot run time: 58726 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cd30e5c44cf975127b5921eadaedbbd03a92bd9d, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.04
query3	0.23	0.06	0.07
query4	1.63	0.11	0.10
query5	0.43	0.41	0.41
query6	1.18	0.65	0.66
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.56	0.52	0.49
query10	0.56	0.55	0.56
query11	0.14	0.11	0.11
query12	0.14	0.11	0.11
query13	0.62	0.61	0.60
query14	2.70	2.72	2.85
query15	0.90	0.83	0.83
query16	0.38	0.38	0.37
query17	1.05	1.06	1.02
query18	0.24	0.23	0.21
query19	1.87	1.87	2.02
query20	0.02	0.00	0.02
query21	15.36	0.58	0.57
query22	2.33	2.54	1.64
query23	16.93	0.87	0.81
query24	3.34	0.60	1.37
query25	0.20	0.18	0.24
query26	0.34	0.13	0.15
query27	0.04	0.05	0.04
query28	10.81	1.09	1.07
query29	12.53	3.23	3.25
query30	0.25	0.07	0.06
query31	2.86	0.38	0.37
query32	3.30	0.46	0.46
query33	3.05	3.00	3.03
query34	16.88	4.45	4.46
query35	4.46	4.48	4.53
query36	0.68	0.48	0.50
query37	0.09	0.06	0.06
query38	0.04	0.03	0.03
query39	0.03	0.02	0.02
query40	0.16	0.12	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.66 s
Total hot run time: 32.07 s

@zddr zddr requested a review from morrySnow November 19, 2024 12:22
@zddr
Copy link
Contributor Author

zddr commented Nov 20, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 45480 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 84cb3364c70fa46d6117a43136de8295be0a02b4, data reload: false

------ Round 1 ----------------------------------
q1	17731	7407	7331	7331
q2	2223	1164	1166	1164
q3	9972	1209	1186	1186
q4	10219	739	786	739
q5	7599	2751	2732	2732
q6	238	154	156	154
q7	987	620	599	599
q8	9364	2368	2360	2360
q9	6703	6519	6469	6469
q10	7018	2318	2347	2318
q11	470	267	264	264
q12	417	227	215	215
q13	17773	3046	3070	3046
q14	240	217	215	215
q15	583	536	530	530
q16	663	582	595	582
q17	1010	542	549	542
q18	7307	6725	6829	6725
q19	1334	1051	1017	1017
q20	2899	2691	2701	2691
q21	4026	3296	3264	3264
q22	1407	1337	1339	1337
Total cold run time: 110183 ms
Total hot run time: 45480 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7463	7294	7639	7294
q2	342	235	245	235
q3	3073	3063	2961	2961
q4	2084	1807	1828	1807
q5	5672	5720	5773	5720
q6	219	141	149	141
q7	2221	1831	1761	1761
q8	3321	3625	3620	3620
q9	8997	8948	8968	8948
q10	3616	3574	3596	3574
q11	590	496	492	492
q12	816	599	597	597
q13	11877	3244	3303	3244
q14	315	293	264	264
q15	573	527	539	527
q16	678	641	654	641
q17	1866	1641	1609	1609
q18	8261	7648	7639	7639
q19	1722	1547	1583	1547
q20	2129	1855	1908	1855
q21	5467	5457	5438	5438
q22	642	555	590	555
Total cold run time: 71944 ms
Total hot run time: 60469 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 84cb3364c70fa46d6117a43136de8295be0a02b4, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.04
query3	0.24	0.06	0.07
query4	1.62	0.11	0.10
query5	0.41	0.43	0.41
query6	1.17	0.66	0.66
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.57	0.49	0.50
query10	0.55	0.54	0.58
query11	0.14	0.11	0.11
query12	0.14	0.11	0.11
query13	0.61	0.61	0.61
query14	2.72	2.72	2.85
query15	0.91	0.84	0.84
query16	0.38	0.38	0.37
query17	1.01	1.03	1.05
query18	0.20	0.20	0.21
query19	1.89	1.86	1.97
query20	0.01	0.01	0.01
query21	15.37	0.59	0.60
query22	2.81	2.66	2.47
query23	16.94	1.09	0.92
query24	2.76	1.75	1.24
query25	0.30	0.10	0.15
query26	0.38	0.14	0.14
query27	0.04	0.04	0.05
query28	10.15	1.13	1.10
query29	12.52	3.26	3.22
query30	0.25	0.06	0.06
query31	2.84	0.39	0.38
query32	3.27	0.46	0.47
query33	3.00	2.96	3.08
query34	16.89	4.52	4.60
query35	4.59	4.53	4.59
query36	0.68	0.49	0.48
query37	0.09	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.03
query40	0.15	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.99 s
Total hot run time: 33.76 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 95e9765 into apache:master Nov 21, 2024
27 of 28 checks passed
zddr added a commit to zddr/incubator-doris that referenced this pull request Dec 19, 2024
### What problem does this PR solve?
Previously, when using Paimon to create MTMV, it was not possible to
perceive changes in partition lists and data, so only `refresh
materialized view mv1 complete` could be used to force full refresh.

This PR obtains the partition list of Paimon, the last update time of
the partition, and the latest snapshotId of the table.

Therefore, MTMV can be partitioned based on Paimon tables and perceive
changes in data, automatically refreshing partitions

### Release note
mtmv support paimon partition refresh
morrySnow pushed a commit that referenced this pull request Dec 24, 2024
…44911 (#45660)

pick: #44911 #43959

only pick code about paimon, not pick some code about MTMV REFRESH
zddr added a commit to zddr/incubator-doris that referenced this pull request Dec 24, 2024
yiguolei pushed a commit that referenced this pull request Dec 25, 2024
)

pick: #44911 #43959

only pick code about paimon, not pick some code about MTMV REFRESH
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants