Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](Nereids) fix fe folding constant of string functions and add more cases (#45233) #46525

Merged
merged 3 commits into from
Jan 9, 2025

Conversation

LiBinfeng-01
Copy link
Collaborator

pick: #45233

Issue Number: #44666
Related PR: #40441

Problem Summary:

  • select substring_index('哈哈哈AAA','A', 1); String.split function has second parameter 'limit', which is default zero. When 'limit' is zero, it means it would remove trailing empty strings split of '哈哈哈AAA', which would be '哈哈哈' only. But what we expect is '哈哈哈', '','','' when part function is used by substring index. So we should change splitpart limit to -1 to enable trailing empty character in splitpart list
  • reorganize fold constant of string functions in fe and add more cases

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…re cases (apache#45233)

Issue Number: apache#44666
Related PR: apache#40441

Problem Summary:
- select substring_index('哈哈哈AAA','A', 1);
String.split function has second parameter 'limit', which is default
zero. When 'limit' is zero, it means it would remove trailing
empty strings split of '哈哈哈AAA', which would be '哈哈哈' only. But what we
expect is '哈哈哈', '','','' when part function is used by substring index.
So we should change splitpart limit to -1 to enable trailing empty
character in splitpart list
- reorganize fold constant of string functions in fe and add more cases
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@LiBinfeng-01
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40970 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c37507c191259e81dd624d4351bcffcc6d375b91, data reload: false

------ Round 1 ----------------------------------
q1	17862	7504	7352	7352
q2	2056	187	153	153
q3	10702	1134	1164	1134
q4	10469	750	765	750
q5	7756	3042	2818	2818
q6	243	148	147	147
q7	993	614	616	614
q8	9360	1969	1940	1940
q9	6627	6395	6436	6395
q10	6986	2233	2307	2233
q11	455	275	277	275
q12	411	208	213	208
q13	17794	3004	2992	2992
q14	238	210	217	210
q15	557	513	525	513
q16	665	609	612	609
q17	973	518	592	518
q18	7373	6609	6723	6609
q19	1393	1043	1054	1043
q20	478	206	199	199
q21	3966	3305	3261	3261
q22	1115	997	1019	997
Total cold run time: 108472 ms
Total hot run time: 40970 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7270	7252	7245	7245
q2	329	236	226	226
q3	2904	2775	2718	2718
q4	2001	1693	1743	1693
q5	5444	5454	5453	5453
q6	216	137	142	137
q7	2075	1696	1714	1696
q8	3228	3380	3431	3380
q9	8560	8550	8568	8550
q10	3531	3423	3481	3423
q11	589	484	494	484
q12	791	571	599	571
q13	8645	2984	2991	2984
q14	284	267	260	260
q15	576	516	520	516
q16	713	677	662	662
q17	1798	1584	1598	1584
q18	7795	7417	7347	7347
q19	1652	1595	1568	1568
q20	2045	1820	1812	1812
q21	5383	5122	5167	5122
q22	1106	997	1026	997
Total cold run time: 66935 ms
Total hot run time: 58428 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192826 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c37507c191259e81dd624d4351bcffcc6d375b91, data reload: false

query1	1009	375	359	359
query2	6510	2110	1968	1968
query3	6696	216	220	216
query4	33990	23520	23658	23520
query5	4310	462	465	462
query6	269	170	168	168
query7	4649	310	310	310
query8	280	219	218	218
query9	9520	2715	2717	2715
query10	486	255	261	255
query11	18198	15075	15206	15075
query12	165	109	109	109
query13	1643	425	411	411
query14	10216	7619	7507	7507
query15	289	184	175	175
query16	8093	492	491	491
query17	1760	592	578	578
query18	2153	315	319	315
query19	384	160	155	155
query20	118	109	112	109
query21	208	106	103	103
query22	4755	4446	4605	4446
query23	34895	33889	34030	33889
query24	11207	2871	2910	2871
query25	680	427	429	427
query26	1569	167	174	167
query27	2852	343	355	343
query28	8051	2467	2424	2424
query29	984	473	454	454
query30	332	168	172	168
query31	1051	805	829	805
query32	95	60	60	60
query33	797	292	301	292
query34	938	505	518	505
query35	905	721	730	721
query36	1113	958	987	958
query37	144	77	73	73
query38	3996	3886	3828	3828
query39	1512	1436	1433	1433
query40	291	103	104	103
query41	54	52	51	51
query42	117	105	103	103
query43	537	493	493	493
query44	1250	824	821	821
query45	185	169	171	169
query46	1158	734	738	734
query47	1933	1844	1846	1844
query48	483	386	397	386
query49	1232	411	400	400
query50	820	419	417	417
query51	7279	7096	7036	7036
query52	106	97	92	92
query53	259	184	192	184
query54	1376	478	468	468
query55	84	81	82	81
query56	293	268	261	261
query57	1226	1134	1149	1134
query58	249	219	221	219
query59	3079	3150	2909	2909
query60	291	272	286	272
query61	148	122	114	114
query62	871	677	671	671
query63	225	193	192	192
query64	5285	678	666	666
query65	3321	3183	3263	3183
query66	1427	310	313	310
query67	16072	15735	15613	15613
query68	4801	571	568	568
query69	440	283	266	266
query70	1183	1142	1130	1130
query71	421	254	259	254
query72	6442	4069	4061	4061
query73	781	351	357	351
query74	10157	9046	9012	9012
query75	3638	2632	2622	2622
query76	2921	1098	1059	1059
query77	394	286	279	279
query78	10537	9592	9697	9592
query79	1979	613	611	611
query80	1081	430	427	427
query81	555	240	239	239
query82	925	114	118	114
query83	217	146	151	146
query84	237	77	81	77
query85	1295	315	296	296
query86	440	294	290	290
query87	4529	4252	4264	4252
query88	3820	2442	2397	2397
query89	420	295	296	295
query90	2034	195	193	193
query91	187	170	154	154
query92	67	53	52	52
query93	1619	558	543	543
query94	909	293	295	293
query95	353	265	264	264
query96	619	286	284	284
query97	3381	3228	3192	3192
query98	217	198	196	196
query99	1517	1308	1312	1308
Total cold run time: 304589 ms
Total hot run time: 192826 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.64 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c37507c191259e81dd624d4351bcffcc6d375b91, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.06	0.06
query4	1.63	0.10	0.10
query5	0.55	0.53	0.52
query6	1.14	0.73	0.72
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.57	0.50	0.49
query10	0.57	0.55	0.55
query11	0.14	0.10	0.10
query12	0.14	0.10	0.12
query13	0.61	0.60	0.61
query14	2.92	2.92	2.92
query15	0.90	0.82	0.82
query16	0.38	0.37	0.38
query17	0.95	1.05	1.07
query18	0.22	0.22	0.21
query19	1.90	1.87	2.05
query20	0.01	0.02	0.01
query21	15.36	0.60	0.58
query22	3.05	2.83	1.93
query23	16.99	0.90	0.77
query24	2.60	1.46	0.47
query25	0.23	0.07	0.16
query26	0.40	0.14	0.14
query27	0.05	0.05	0.04
query28	11.01	1.10	1.07
query29	12.57	3.34	3.28
query30	0.25	0.06	0.06
query31	2.86	0.39	0.38
query32	3.25	0.46	0.46
query33	3.02	3.01	3.04
query34	16.76	4.50	4.47
query35	4.60	4.52	4.63
query36	0.66	0.50	0.50
query37	0.10	0.07	0.06
query38	0.04	0.03	0.04
query39	0.04	0.02	0.02
query40	0.16	0.12	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.02	0.02
Total cold run time: 107.17 s
Total hot run time: 32.64 s

LiBinfeng-01 and others added 2 commits January 9, 2025 10:55
apache#40820)

example: select append_trailing_char_if_absent('it','a') would return
null in original design, it can not return null when folding constant on
fe any time
…oldConst. (apache#40947)

```
mysql [(none)]>set debug_skip_fold_constant = false;
Query OK, 0 rows affected (0.00 sec)

mysql [(none)]>select length('你');
+---------------+
| length('你')  |
+---------------+
|             1 |
+---------------+
1 row in set (0.01 sec)

mysql [(none)]>set debug_skip_fold_constant = true;
Query OK, 0 rows affected (0.00 sec)

mysql [(none)]>select length('你');
+---------------+
| length('你')  |
+---------------+
|             3 |
+---------------+
```

<!--Describe your changes.-->
@LiBinfeng-01
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40955 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f4c21ea76deba0ee22eb7aec54ac95952c4ae3ad, data reload: false

------ Round 1 ----------------------------------
q1	17572	8120	7284	7284
q2	2066	179	189	179
q3	10577	1089	1152	1089
q4	10575	808	735	735
q5	7753	2869	2839	2839
q6	238	147	145	145
q7	968	614	623	614
q8	9355	1951	2023	1951
q9	6693	6467	6457	6457
q10	7032	2278	2295	2278
q11	464	259	263	259
q12	405	213	222	213
q13	17782	2970	2986	2970
q14	247	210	220	210
q15	558	533	534	533
q16	699	618	603	603
q17	979	559	546	546
q18	7384	6677	6596	6596
q19	1407	1052	1092	1052
q20	483	204	198	198
q21	4072	3269	3213	3213
q22	1094	996	991	991
Total cold run time: 108403 ms
Total hot run time: 40955 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7245	7242	7236	7236
q2	327	234	228	228
q3	2965	2920	2911	2911
q4	2120	1864	1771	1771
q5	5777	5746	5729	5729
q6	237	140	140	140
q7	2235	1776	1847	1776
q8	3337	3583	3534	3534
q9	8856	8979	8842	8842
q10	3595	3580	3530	3530
q11	612	504	495	495
q12	858	579	613	579
q13	10206	3236	3209	3209
q14	301	270	273	270
q15	590	533	530	530
q16	748	687	700	687
q17	1848	1637	1631	1631
q18	8262	7740	7697	7697
q19	1678	1553	1416	1416
q20	2089	1845	1907	1845
q21	5658	5255	5366	5255
q22	1129	1029	1014	1014
Total cold run time: 70673 ms
Total hot run time: 60325 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196826 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f4c21ea76deba0ee22eb7aec54ac95952c4ae3ad, data reload: false

query1	1286	888	883	883
query2	6253	2055	2060	2055
query3	10815	4239	4237	4237
query4	66484	29025	23499	23499
query5	4981	437	477	437
query6	414	177	170	170
query7	5648	306	301	301
query8	321	229	240	229
query9	9209	2676	2687	2676
query10	484	270	261	261
query11	17590	15311	15767	15311
query12	154	103	102	102
query13	1530	439	436	436
query14	9829	7767	6822	6822
query15	215	179	190	179
query16	7063	470	532	470
query17	1098	589	589	589
query18	1698	334	334	334
query19	245	165	159	159
query20	122	113	116	113
query21	203	102	102	102
query22	4559	4605	4326	4326
query23	34838	34739	34103	34103
query24	6156	2900	2986	2900
query25	516	438	431	431
query26	663	167	167	167
query27	1856	360	366	360
query28	3975	2494	2466	2466
query29	725	488	438	438
query30	236	159	166	159
query31	999	829	881	829
query32	69	54	57	54
query33	423	269	273	269
query34	929	510	537	510
query35	864	727	727	727
query36	1055	971	958	958
query37	120	74	72	72
query38	4175	3991	4147	3991
query39	1520	1479	1472	1472
query40	207	102	97	97
query41	48	47	52	47
query42	117	102	98	98
query43	548	493	500	493
query44	1226	840	839	839
query45	190	177	167	167
query46	1187	728	744	728
query47	2028	1918	1968	1918
query48	469	416	405	405
query49	730	393	424	393
query50	843	418	452	418
query51	7432	7195	7074	7074
query52	97	88	86	86
query53	256	182	179	179
query54	558	456	466	456
query55	76	77	75	75
query56	257	250	234	234
query57	1231	1136	1110	1110
query58	218	204	206	204
query59	3267	3110	2893	2893
query60	263	250	250	250
query61	114	110	108	108
query62	768	645	669	645
query63	219	186	186	186
query64	1364	661	627	627
query65	3241	3177	3210	3177
query66	632	292	321	292
query67	16049	15607	15505	15505
query68	4012	582	579	579
query69	409	263	273	263
query70	1104	1153	1146	1146
query71	355	256	260	256
query72	6434	4211	4008	4008
query73	747	340	354	340
query74	10157	8898	8962	8898
query75	3321	2635	2637	2635
query76	1826	1072	1193	1072
query77	463	265	263	263
query78	10546	9650	9671	9650
query79	1617	600	599	599
query80	1034	431	422	422
query81	541	247	237	237
query82	208	114	121	114
query83	163	138	145	138
query84	284	80	78	78
query85	969	301	281	281
query86	412	309	271	271
query87	4478	4358	4173	4173
query88	3933	2383	2364	2364
query89	413	293	293	293
query90	1909	186	197	186
query91	186	147	175	147
query92	63	49	48	48
query93	2419	554	564	554
query94	773	292	294	292
query95	352	258	253	253
query96	614	275	277	275
query97	3318	3195	3276	3195
query98	211	215	205	205
query99	1564	1338	1290	1290
Total cold run time: 317790 ms
Total hot run time: 196826 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.89 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f4c21ea76deba0ee22eb7aec54ac95952c4ae3ad, data reload: false

query1	0.03	0.04	0.03
query2	0.06	0.03	0.04
query3	0.24	0.06	0.06
query4	1.63	0.11	0.10
query5	0.52	0.52	0.50
query6	1.15	0.73	0.73
query7	0.02	0.01	0.02
query8	0.04	0.04	0.03
query9	0.56	0.51	0.48
query10	0.55	0.55	0.55
query11	0.14	0.11	0.11
query12	0.14	0.11	0.12
query13	0.62	0.60	0.60
query14	3.05	3.09	2.94
query15	0.91	0.83	0.83
query16	0.37	0.39	0.39
query17	1.01	1.05	1.05
query18	0.24	0.21	0.22
query19	1.88	1.92	1.91
query20	0.02	0.01	0.01
query21	15.36	0.58	0.56
query22	2.57	2.24	1.32
query23	17.01	1.01	0.76
query24	3.35	0.38	0.85
query25	0.30	0.21	0.08
query26	0.23	0.14	0.13
query27	0.05	0.04	0.04
query28	11.11	1.12	1.07
query29	12.56	3.27	3.31
query30	0.25	0.06	0.06
query31	2.86	0.38	0.38
query32	3.23	0.46	0.46
query33	3.01	3.01	3.00
query34	17.06	4.50	4.45
query35	4.53	4.49	4.48
query36	0.68	0.50	0.48
query37	0.09	0.05	0.06
query38	0.05	0.03	0.03
query39	0.03	0.03	0.02
query40	0.15	0.12	0.13
query41	0.07	0.02	0.03
query42	0.04	0.02	0.02
query43	0.03	0.03	0.04
Total cold run time: 107.8 s
Total hot run time: 31.89 s

@morrySnow morrySnow merged commit b51a8f1 into apache:branch-3.0 Jan 9, 2025
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants