Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](split)Fixed the bug that batch mode split could not query data in multiple be scenarios. #46218

Merged
merged 1 commit into from
Dec 31, 2024

Conversation

hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Dec 31, 2024

What problem does this PR solve?

Problem Summary:
In multiple be scenarios, batch mode split sometimes could not query data.

The reason is that the estimated numApproximateSplits() may be relatively small, and the value after dividing the current number of be may be 0. As a result, the split will not be distributed to be, and the query result will be empty.
We need to take the max of the value after division and 1.

Followup #45148

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32775 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7b08a3eae01b2d6821b2c0e70d3270602655e028, data reload: false

------ Round 1 ----------------------------------
q1	17620	6120	6045	6045
q2	2050	335	173	173
q3	10376	1257	717	717
q4	10199	861	426	426
q5	7518	2157	1993	1993
q6	205	182	152	152
q7	900	745	627	627
q8	9240	1323	1189	1189
q9	5207	5004	4984	4984
q10	6765	2280	1849	1849
q11	490	283	267	267
q12	365	346	213	213
q13	17769	3627	3037	3037
q14	235	231	211	211
q15	551	502	487	487
q16	624	626	598	598
q17	556	851	319	319
q18	7068	6501	6445	6445
q19	1242	957	566	566
q20	309	332	188	188
q21	2770	2151	1980	1980
q22	356	330	309	309
Total cold run time: 102415 ms
Total hot run time: 32775 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6225	6227	6369	6227
q2	229	325	226	226
q3	2207	2638	2358	2358
q4	1409	1822	1393	1393
q5	4363	4712	4741	4712
q6	187	184	146	146
q7	2164	2071	1796	1796
q8	2589	2753	2646	2646
q9	7326	7227	7156	7156
q10	3061	3289	2827	2827
q11	615	514	504	504
q12	668	767	609	609
q13	3403	3779	3073	3073
q14	302	303	266	266
q15	578	510	502	502
q16	673	675	646	646
q17	1188	1727	1243	1243
q18	7860	7441	7381	7381
q19	841	1024	1100	1024
q20	1951	1967	1818	1818
q21	5312	5024	4888	4888
q22	625	616	577	577
Total cold run time: 53776 ms
Total hot run time: 52018 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190504 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7b08a3eae01b2d6821b2c0e70d3270602655e028, data reload: false

query1	972	372	356	356
query2	6537	2366	2292	2292
query3	6704	220	206	206
query4	33575	24084	23619	23619
query5	4309	625	455	455
query6	285	203	211	203
query7	4626	504	311	311
query8	293	233	234	233
query9	9728	2720	2712	2712
query10	467	303	239	239
query11	18084	15340	15281	15281
query12	156	104	106	104
query13	1655	529	400	400
query14	10687	7397	6794	6794
query15	230	201	192	192
query16	8092	594	417	417
query17	1532	720	557	557
query18	2116	406	318	318
query19	226	182	165	165
query20	127	117	118	117
query21	215	123	103	103
query22	4411	4317	4296	4296
query23	34651	34516	33650	33650
query24	6476	2297	2262	2262
query25	434	430	373	373
query26	1058	251	151	151
query27	2069	471	335	335
query28	5291	2443	2393	2393
query29	521	529	416	416
query30	224	178	158	158
query31	1042	893	804	804
query32	82	59	58	58
query33	495	384	290	290
query34	767	827	521	521
query35	786	817	728	728
query36	1001	1031	948	948
query37	111	97	75	75
query38	4330	4321	4107	4107
query39	1531	1439	1499	1439
query40	200	114	101	101
query41	47	48	49	48
query42	115	102	101	101
query43	515	534	492	492
query44	1327	800	799	799
query45	192	176	168	168
query46	864	1052	649	649
query47	1909	1923	1862	1862
query48	382	409	321	321
query49	786	508	420	420
query50	631	668	383	383
query51	7099	7323	7037	7037
query52	101	100	89	89
query53	224	248	190	190
query54	468	494	410	410
query55	79	79	80	79
query56	252	260	248	248
query57	1184	1199	1159	1159
query58	235	227	219	219
query59	3001	3173	3009	3009
query60	290	258	246	246
query61	117	110	107	107
query62	887	800	739	739
query63	220	186	197	186
query64	4551	1060	735	735
query65	3274	3204	3209	3204
query66	1071	427	322	322
query67	15904	15793	15622	15622
query68	8038	753	515	515
query69	452	298	260	260
query70	1239	1174	1130	1130
query71	413	271	255	255
query72	5744	3876	3941	3876
query73	649	749	362	362
query74	10273	8988	9075	8988
query75	4246	3184	2628	2628
query76	3296	1198	778	778
query77	762	363	266	266
query78	10093	10470	9466	9466
query79	3283	906	586	586
query80	590	538	442	442
query81	493	276	229	229
query82	666	154	126	126
query83	162	162	143	143
query84	235	84	76	76
query85	811	369	296	296
query86	410	286	312	286
query87	4435	4622	4311	4311
query88	5016	2209	2174	2174
query89	409	342	298	298
query90	1778	184	185	184
query91	132	133	109	109
query92	65	56	52	52
query93	1757	882	524	524
query94	682	385	277	277
query95	337	260	251	251
query96	492	611	277	277
query97	2736	2850	2670	2670
query98	234	213	192	192
query99	2027	1600	1454	1454
Total cold run time: 293776 ms
Total hot run time: 190504 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.48 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7b08a3eae01b2d6821b2c0e70d3270602655e028, data reload: false

query1	0.05	0.04	0.03
query2	0.07	0.03	0.04
query3	0.23	0.07	0.07
query4	1.62	0.11	0.11
query5	0.42	0.39	0.41
query6	1.16	0.66	0.64
query7	0.02	0.01	0.02
query8	0.03	0.04	0.03
query9	0.59	0.51	0.51
query10	0.55	0.56	0.56
query11	0.15	0.10	0.10
query12	0.13	0.12	0.11
query13	0.61	0.61	0.60
query14	2.73	2.74	2.85
query15	0.89	0.82	0.83
query16	0.40	0.40	0.39
query17	1.10	1.08	1.01
query18	0.24	0.21	0.20
query19	1.86	1.86	2.00
query20	0.01	0.01	0.01
query21	15.36	0.91	0.56
query22	0.75	0.76	0.68
query23	15.28	1.39	0.64
query24	2.85	1.00	1.12
query25	0.22	0.17	0.17
query26	0.22	0.14	0.14
query27	0.06	0.05	0.06
query28	14.49	1.54	1.05
query29	12.58	3.97	3.26
query30	0.25	0.09	0.07
query31	2.80	0.60	0.38
query32	3.23	0.55	0.46
query33	3.07	3.05	3.11
query34	16.77	5.07	4.47
query35	4.48	4.48	4.48
query36	0.63	0.50	0.52
query37	0.10	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.03
query40	0.18	0.12	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 106.42 s
Total hot run time: 31.48 s

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Dec 31, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit c83e45d into apache:master Dec 31, 2024
26 of 28 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 31, 2024
…in multiple be scenarios. (#46218)

### What problem does this PR solve?
Problem Summary:
In multiple be scenarios, batch mode split sometimes could not query
data.

The reason is that the estimated `numApproximateSplits()` may be
relatively small, and the value after dividing the current number of be
may be 0. As a result, the split will not be distributed to be, and the
query result will be empty.
We need to take the max of the value after division and 1.
github-actions bot pushed a commit that referenced this pull request Dec 31, 2024
…in multiple be scenarios. (#46218)

### What problem does this PR solve?
Problem Summary:
In multiple be scenarios, batch mode split sometimes could not query
data.

The reason is that the estimated `numApproximateSplits()` may be
relatively small, and the value after dividing the current number of be
may be 0. As a result, the split will not be distributed to be, and the
query result will be empty.
We need to take the max of the value after division and 1.
yiguolei pushed a commit that referenced this pull request Dec 31, 2024
… query data in multiple be scenarios. #46218 (#46227)

Cherry-picked from #46218

Co-authored-by: daidai <[email protected]>
morningman pushed a commit that referenced this pull request Jan 1, 2025
… query data in multiple be scenarios. #46218 (#46226)

Cherry-picked from #46218

Co-authored-by: daidai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants