Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](hive)fix select count(*) hive full acid tb opt error. #46732

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Jan 9, 2025

What problem does this PR solve?

Problem Summary:
before pr : #44038
In the previous PR, the generation method of split in the count( * ) scenario was optimized.
However, there were some problems with the hive acid table. This PR mainly fixes this and adds tests.
In the count( * ) scenario, reading the hive full acid table cannot be optimized, and the file still needs to be split (merge on read is required), and the hive insert only acid table does not need to be split.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

hubgeter commented Jan 9, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32414 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a3a96142f6f0f69d814b5e4a23ec931e67b6b02a, data reload: false

------ Round 1 ----------------------------------
q1	17607	6077	6012	6012
q2	2043	302	165	165
q3	10483	1234	701	701
q4	10219	862	428	428
q5	7529	2154	1931	1931
q6	207	178	148	148
q7	876	764	597	597
q8	9238	1368	1176	1176
q9	5133	4791	4997	4791
q10	6752	2307	1875	1875
q11	473	281	263	263
q12	359	356	215	215
q13	17756	3602	3079	3079
q14	242	225	204	204
q15	557	505	489	489
q16	611	610	581	581
q17	587	845	336	336
q18	7099	6430	6389	6389
q19	1763	951	548	548
q20	320	325	200	200
q21	2746	2088	1982	1982
q22	361	331	304	304
Total cold run time: 102961 ms
Total hot run time: 32414 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6242	6189	6211	6189
q2	236	323	225	225
q3	2200	2662	2352	2352
q4	1381	1830	1322	1322
q5	4309	4719	4670	4670
q6	194	178	148	148
q7	2156	2015	1805	1805
q8	2603	2761	2615	2615
q9	7237	7308	7206	7206
q10	3052	3323	2821	2821
q11	586	505	518	505
q12	658	749	672	672
q13	3462	3895	3324	3324
q14	293	300	270	270
q15	544	518	499	499
q16	635	698	651	651
q17	1208	1722	1250	1250
q18	7738	7447	7022	7022
q19	780	1064	1042	1042
q20	1853	1959	1867	1867
q21	5366	5081	4718	4718
q22	614	568	565	565
Total cold run time: 53347 ms
Total hot run time: 51738 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187958 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a3a96142f6f0f69d814b5e4a23ec931e67b6b02a, data reload: false

query1	968	371	371	371
query2	6532	2411	2389	2389
query3	6702	217	213	213
query4	33486	23510	23695	23510
query5	4383	627	459	459
query6	280	205	174	174
query7	4900	507	317	317
query8	286	231	225	225
query9	9429	2702	2701	2701
query10	482	317	247	247
query11	17799	15188	15142	15142
query12	155	107	106	106
query13	1648	524	385	385
query14	10624	7196	6567	6567
query15	237	201	192	192
query16	8275	606	433	433
query17	1584	738	571	571
query18	2135	412	314	314
query19	222	188	157	157
query20	117	112	114	112
query21	212	123	104	104
query22	4389	4401	4293	4293
query23	34514	32959	33076	32959
query24	6375	2290	2216	2216
query25	501	444	389	389
query26	1210	270	157	157
query27	2035	451	334	334
query28	5069	2476	2453	2453
query29	733	527	412	412
query30	230	179	148	148
query31	945	875	783	783
query32	93	60	59	59
query33	546	359	287	287
query34	739	834	504	504
query35	778	800	723	723
query36	1019	1035	912	912
query37	118	100	77	77
query38	4032	4125	4024	4024
query39	1474	1406	1414	1406
query40	202	111	101	101
query41	53	48	50	48
query42	126	110	104	104
query43	541	530	513	513
query44	1336	829	817	817
query45	179	169	157	157
query46	851	1025	650	650
query47	1822	1782	1763	1763
query48	375	403	327	327
query49	788	473	380	380
query50	618	642	396	396
query51	6899	6911	6830	6830
query52	103	99	94	94
query53	217	244	184	184
query54	482	472	413	413
query55	81	78	82	78
query56	251	246	232	232
query57	1181	1135	1111	1111
query58	241	229	222	222
query59	3060	3037	2921	2921
query60	282	282	254	254
query61	138	109	108	108
query62	831	754	694	694
query63	224	193	187	187
query64	4500	1009	679	679
query65	3229	3153	3168	3153
query66	1069	415	308	308
query67	15839	15670	15467	15467
query68	8215	699	516	516
query69	467	279	247	247
query70	1198	1099	1089	1089
query71	446	286	250	250
query72	6165	3806	3975	3806
query73	645	744	357	357
query74	10418	8851	8937	8851
query75	4139	3142	2615	2615
query76	3704	1172	802	802
query77	759	355	271	271
query78	9906	9973	9319	9319
query79	3762	801	587	587
query80	708	517	427	427
query81	485	271	228	228
query82	636	150	128	128
query83	164	161	146	146
query84	238	87	70	70
query85	771	361	293	293
query86	396	320	268	268
query87	4465	4414	4228	4228
query88	5005	2178	2155	2155
query89	394	326	294	294
query90	1831	189	200	189
query91	133	136	112	112
query92	72	62	52	52
query93	2327	856	538	538
query94	667	393	285	285
query95	338	256	252	252
query96	485	595	275	275
query97	2843	2946	2727	2727
query98	230	201	199	199
query99	1465	1492	1366	1366
Total cold run time: 294251 ms
Total hot run time: 187958 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.78 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a3a96142f6f0f69d814b5e4a23ec931e67b6b02a, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.03
query3	0.24	0.07	0.06
query4	1.60	0.11	0.11
query5	0.43	0.43	0.42
query6	1.17	0.66	0.64
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.60	0.49	0.49
query10	0.54	0.56	0.54
query11	0.13	0.10	0.11
query12	0.14	0.11	0.11
query13	0.61	0.59	0.60
query14	2.72	2.73	2.71
query15	0.90	0.84	0.83
query16	0.38	0.39	0.39
query17	1.07	1.08	1.06
query18	0.23	0.21	0.21
query19	1.91	1.90	1.98
query20	0.01	0.01	0.01
query21	15.35	0.90	0.58
query22	0.77	0.82	0.90
query23	14.99	1.39	0.59
query24	3.26	1.13	1.75
query25	0.17	0.12	0.17
query26	0.21	0.14	0.14
query27	0.07	0.06	0.06
query28	14.32	1.50	1.05
query29	12.57	3.86	3.24
query30	0.26	0.09	0.06
query31	2.84	0.58	0.39
query32	3.22	0.54	0.45
query33	3.01	3.17	3.14
query34	16.72	5.08	4.51
query35	4.56	4.44	4.45
query36	0.83	0.48	0.50
query37	0.10	0.06	0.06
query38	0.05	0.03	0.04
query39	0.04	0.02	0.02
query40	0.16	0.12	0.12
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.51 s
Total hot run time: 31.78 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 39.34% (10252/26058)
Line Coverage: 30.52% (87403/286399)
Region Coverage: 29.56% (44539/150678)
Branch Coverage: 26.12% (22805/87322)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a3a96142f6f0f69d814b5e4a23ec931e67b6b02a_a3a96142f6f0f69d814b5e4a23ec931e67b6b02a/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 10, 2025
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit a7df843 into apache:master Jan 10, 2025
28 of 30 checks passed
hubgeter added a commit to hubgeter/doris that referenced this pull request Jan 11, 2025
…6732)

Problem Summary:
before pr : apache#44038
In the previous PR, the generation method of split in the count( * )
scenario was optimized.
However, there were some problems with the hive acid table. This PR
mainly fixes this and adds tests.
In the count( * ) scenario, reading the hive full acid table cannot be
optimized, and the file still needs to be split (merge on read is
required), and the hive insert only acid table does not need to be
split.
hubgeter added a commit to hubgeter/doris that referenced this pull request Jan 11, 2025
…6732)

Problem Summary:
before pr : apache#44038
In the previous PR, the generation method of split in the count( * )
scenario was optimized.
However, there were some problems with the hive acid table. This PR
mainly fixes this and adds tests.
In the count( * ) scenario, reading the hive full acid table cannot be
optimized, and the file still needs to be split (merge on read is
required), and the hive insert only acid table does not need to be
split.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants