Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](hive)fix select count(*) hive full acid tb opt error. (#46732) #46805

Merged
merged 1 commit into from
Jan 12, 2025

Conversation

hubgeter
Copy link
Contributor

What problem does this PR solve?

bp #46732
Problem Summary:
before pr : #44038
In the previous PR, the generation method of split in the count( * ) scenario was optimized.
However, there were some problems with the hive acid table. This PR mainly fixes this and adds tests.
In the count( * ) scenario, reading the hive full acid table cannot be optimized, and the file still needs to be split (merge on read is required), and the hive insert only acid table does not need to be split.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…6732)

Problem Summary:
before pr : apache#44038
In the previous PR, the generation method of split in the count( * )
scenario was optimized.
However, there were some problems with the hive acid table. This PR
mainly fixes this and adds tests.
In the count( * ) scenario, reading the hive full acid table cannot be
optimized, and the file still needs to be split (merge on read is
required), and the hive insert only acid table does not need to be
split.
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40964 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b0a1514c87beb388e1873775d9ca223b1709432b, data reload: false

------ Round 1 ----------------------------------
q1	17579	7333	7236	7236
q2	2045	172	186	172
q3	10674	1073	1155	1073
q4	10492	741	721	721
q5	7760	2854	2811	2811
q6	241	147	147	147
q7	978	611	604	604
q8	9371	1921	1998	1921
q9	6645	6441	6436	6436
q10	7038	2313	2330	2313
q11	472	258	263	258
q12	405	211	209	209
q13	18191	3069	3128	3069
q14	240	218	220	218
q15	571	516	525	516
q16	671	599	604	599
q17	983	630	594	594
q18	7349	6674	6631	6631
q19	1448	1054	1093	1054
q20	461	202	203	202
q21	4079	3187	3336	3187
q22	1096	999	993	993
Total cold run time: 108789 ms
Total hot run time: 40964 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7243	7204	7218	7204
q2	320	228	237	228
q3	2926	2935	2948	2935
q4	2025	1878	1836	1836
q5	5700	5713	5767	5713
q6	226	143	139	139
q7	2280	1822	1817	1817
q8	3339	3586	3492	3492
q9	8780	8891	8870	8870
q10	3603	3515	3538	3515
q11	610	508	503	503
q12	813	628	598	598
q13	9401	3171	3163	3163
q14	301	290	267	267
q15	583	532	509	509
q16	699	672	667	667
q17	1852	1625	1592	1592
q18	8141	7846	7575	7575
q19	1684	1505	1602	1505
q20	2123	1880	1865	1865
q21	5396	5439	5488	5439
q22	1158	1065	1023	1023
Total cold run time: 69203 ms
Total hot run time: 60455 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 198603 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b0a1514c87beb388e1873775d9ca223b1709432b, data reload: false

query1	1302	946	907	907
query2	6251	2106	2027	2027
query3	10935	4388	4362	4362
query4	66297	29229	23597	23597
query5	4969	457	451	451
query6	410	168	165	165
query7	5676	305	311	305
query8	317	242	226	226
query9	9288	2686	2664	2664
query10	504	279	251	251
query11	17521	15283	15971	15283
query12	160	111	105	105
query13	1564	451	468	451
query14	9786	7606	7716	7606
query15	209	176	191	176
query16	7101	498	470	470
query17	1070	592	591	591
query18	1928	345	355	345
query19	245	165	157	157
query20	116	110	113	110
query21	210	107	107	107
query22	4837	4548	4584	4548
query23	34765	34148	34271	34148
query24	6075	2929	2934	2929
query25	540	446	458	446
query26	662	178	168	168
query27	1731	349	365	349
query28	4020	2479	2470	2470
query29	726	483	470	470
query30	246	166	178	166
query31	1031	842	833	833
query32	73	55	58	55
query33	429	281	283	281
query34	928	516	533	516
query35	835	745	711	711
query36	1090	969	986	969
query37	124	82	71	71
query38	4156	4110	4029	4029
query39	1527	1498	1508	1498
query40	210	110	100	100
query41	50	49	47	47
query42	116	105	100	100
query43	533	496	483	483
query44	1233	837	829	829
query45	188	167	165	165
query46	1170	734	733	733
query47	2040	1951	1937	1937
query48	494	380	384	380
query49	768	421	404	404
query50	838	433	419	419
query51	7556	7296	7163	7163
query52	98	86	85	85
query53	267	184	183	183
query54	543	445	447	445
query55	78	74	75	74
query56	261	244	252	244
query57	1232	1116	1091	1091
query58	213	206	203	203
query59	3120	3020	3004	3004
query60	275	250	251	250
query61	108	105	107	105
query62	772	656	675	656
query63	217	197	195	195
query64	1364	680	632	632
query65	3263	3216	3232	3216
query66	673	312	304	304
query67	15938	15527	15544	15527
query68	3956	586	573	573
query69	441	285	265	265
query70	1200	1094	1101	1094
query71	356	255	249	249
query72	6382	3985	3996	3985
query73	761	344	345	344
query74	10222	9117	9121	9117
query75	3363	2617	2703	2617
query76	1879	1158	1011	1011
query77	485	266	310	266
query78	10428	9747	9595	9595
query79	1801	596	603	596
query80	1443	434	412	412
query81	524	248	240	240
query82	1305	114	124	114
query83	268	140	142	140
query84	282	82	75	75
query85	1012	290	294	290
query86	421	291	312	291
query87	4393	4279	4277	4277
query88	3875	2387	2350	2350
query89	417	292	288	288
query90	1934	190	186	186
query91	180	149	146	146
query92	70	49	51	49
query93	2149	548	549	548
query94	858	301	290	290
query95	354	251	256	251
query96	615	282	285	282
query97	3381	3205	3233	3205
query98	221	223	196	196
query99	1618	1308	1274	1274
Total cold run time: 319866 ms
Total hot run time: 198603 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.5 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b0a1514c87beb388e1873775d9ca223b1709432b, data reload: false

query1	0.04	0.03	0.02
query2	0.06	0.03	0.03
query3	0.23	0.07	0.07
query4	1.63	0.10	0.10
query5	0.50	0.54	0.51
query6	1.14	0.73	0.73
query7	0.02	0.01	0.02
query8	0.04	0.04	0.03
query9	0.56	0.51	0.51
query10	0.56	0.53	0.57
query11	0.14	0.10	0.10
query12	0.14	0.11	0.10
query13	0.62	0.59	0.60
query14	2.92	2.88	2.92
query15	0.90	0.82	0.83
query16	0.40	0.41	0.39
query17	1.02	1.06	1.01
query18	0.23	0.22	0.22
query19	2.00	1.87	2.05
query20	0.02	0.01	0.01
query21	15.38	0.60	0.58
query22	2.42	2.43	1.91
query23	17.13	0.96	0.74
query24	2.93	1.38	1.94
query25	0.24	0.22	0.05
query26	0.58	0.13	0.14
query27	0.05	0.03	0.04
query28	9.93	1.09	1.07
query29	12.60	3.26	3.25
query30	0.25	0.06	0.06
query31	2.86	0.39	0.39
query32	3.24	0.46	0.47
query33	2.96	3.08	3.08
query34	17.13	4.49	4.49
query35	4.58	4.54	4.54
query36	0.69	0.50	0.51
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.04	0.03	0.03
query40	0.15	0.13	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.04
Total cold run time: 106.6 s
Total hot run time: 33.5 s

@morningman morningman merged commit 1e64740 into apache:branch-3.0 Jan 12, 2025
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants