Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](Variant) fix some nested explode_variant_array bug and add more… #44533

Merged
merged 2 commits into from
Dec 2, 2024

Conversation

eldenmoon
Copy link
Member

… test

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon
Copy link
Member Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@eldenmoon eldenmoon force-pushed the fix-variant-nested-explode branch from 5c60c45 to a2a985c Compare November 25, 2024 10:54
@eldenmoon
Copy link
Member Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40007 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a2a985cdaece73437cda2f622c558b1807b14c51, data reload: false

------ Round 1 ----------------------------------
q1	17633	7758	7346	7346
q2	2057	180	182	180
q3	10679	1154	1141	1141
q4	10516	733	760	733
q5	7590	2731	2735	2731
q6	240	151	147	147
q7	998	624	604	604
q8	9246	1837	1922	1837
q9	6582	6426	6393	6393
q10	6961	2335	2285	2285
q11	454	251	258	251
q12	411	222	219	219
q13	17784	3006	2988	2988
q14	247	211	209	209
q15	547	521	520	520
q16	645	565	584	565
q17	978	523	515	515
q18	7548	6721	6686	6686
q19	1326	972	1083	972
q20	472	184	177	177
q21	3969	3266	3192	3192
q22	374	317	316	316
Total cold run time: 107257 ms
Total hot run time: 40007 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7274	7287	7280	7280
q2	327	231	227	227
q3	2908	2793	3142	2793
q4	2067	1827	1793	1793
q5	5602	5662	5644	5644
q6	218	139	136	136
q7	2205	1778	1836	1778
q8	3400	3515	3519	3515
q9	8826	8879	8924	8879
q10	3602	3554	3544	3544
q11	608	505	486	486
q12	843	635	636	635
q13	8747	3202	3181	3181
q14	303	295	272	272
q15	553	492	517	492
q16	663	634	653	634
q17	1831	1595	1565	1565
q18	7924	7524	7442	7442
q19	1680	1467	1586	1467
q20	2032	1813	1825	1813
q21	5256	5283	5110	5110
q22	641	589	527	527
Total cold run time: 67510 ms
Total hot run time: 59213 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190830 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a2a985cdaece73437cda2f622c558b1807b14c51, data reload: false

query1	949	368	369	368
query2	6522	2103	2043	2043
query3	6707	208	212	208
query4	33967	23439	23539	23439
query5	4390	474	446	446
query6	279	205	191	191
query7	4635	296	309	296
query8	295	236	226	226
query9	9547	2724	2744	2724
query10	489	271	252	252
query11	18238	15328	15140	15140
query12	158	105	103	103
query13	1655	412	437	412
query14	9767	7117	7407	7117
query15	325	184	182	182
query16	8210	495	489	489
query17	1833	580	575	575
query18	2137	297	302	297
query19	370	149	142	142
query20	120	114	110	110
query21	212	100	107	100
query22	4689	4232	4319	4232
query23	35683	34059	34310	34059
query24	11436	2426	2419	2419
query25	669	391	382	382
query26	1783	147	148	147
query27	2806	286	271	271
query28	8145	2482	2457	2457
query29	1022	402	408	402
query30	300	150	155	150
query31	1049	787	838	787
query32	91	58	56	56
query33	773	295	277	277
query34	1027	505	517	505
query35	867	727	706	706
query36	1095	960	962	960
query37	277	73	78	73
query38	4366	4219	4246	4219
query39	1491	1420	1410	1410
query40	281	95	99	95
query41	50	45	44	44
query42	107	98	100	98
query43	535	489	453	453
query44	1216	824	818	818
query45	185	163	164	163
query46	1147	687	697	687
query47	1976	1835	1847	1835
query48	415	309	310	309
query49	1286	382	375	375
query50	818	393	382	382
query51	7329	7023	7227	7023
query52	103	91	88	88
query53	258	179	179	179
query54	1148	400	403	400
query55	79	79	76	76
query56	286	241	250	241
query57	1308	1212	1193	1193
query58	231	225	230	225
query59	3152	3203	3019	3019
query60	284	240	259	240
query61	136	108	114	108
query62	868	689	682	682
query63	211	195	186	186
query64	5092	654	618	618
query65	3305	3179	3197	3179
query66	1404	321	312	312
query67	15964	15780	15533	15533
query68	5109	553	553	553
query69	432	250	259	250
query70	1127	1135	1147	1135
query71	317	271	243	243
query72	6374	4039	4090	4039
query73	761	354	372	354
query74	10329	9011	8920	8920
query75	3490	2648	2629	2629
query76	3267	1076	1022	1022
query77	511	275	278	275
query78	10527	9515	9399	9399
query79	2589	602	629	602
query80	1153	453	433	433
query81	561	233	234	233
query82	733	117	202	117
query83	259	153	143	143
query84	237	81	77	77
query85	1420	301	295	295
query86	442	297	274	274
query87	4605	4532	4554	4532
query88	3786	2240	2207	2207
query89	410	286	286	286
query90	2174	186	181	181
query91	138	105	104	104
query92	68	51	51	51
query93	1544	543	554	543
query94	1042	295	288	288
query95	359	272	249	249
query96	614	282	281	281
query97	2848	2691	2736	2691
query98	216	197	196	196
query99	1630	1314	1295	1295
Total cold run time: 306620 ms
Total hot run time: 190830 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.32% (9977/26033)
Line Coverage: 29.42% (83484/283788)
Region Coverage: 28.60% (42985/150303)
Branch Coverage: 25.17% (21831/86724)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a2a985cdaece73437cda2f622c558b1807b14c51_a2a985cdaece73437cda2f622c558b1807b14c51/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 31.22 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a2a985cdaece73437cda2f622c558b1807b14c51, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.04	0.03
query3	0.24	0.09	0.07
query4	1.59	0.11	0.10
query5	0.44	0.38	0.42
query6	1.17	0.69	0.64
query7	0.02	0.02	0.02
query8	0.05	0.03	0.03
query9	0.57	0.52	0.49
query10	0.55	0.57	0.55
query11	0.14	0.10	0.10
query12	0.13	0.11	0.10
query13	0.60	0.61	0.60
query14	2.70	2.71	2.69
query15	0.90	0.82	0.82
query16	0.37	0.38	0.39
query17	1.03	1.06	1.06
query18	0.22	0.21	0.20
query19	1.97	1.84	1.99
query20	0.02	0.02	0.01
query21	15.36	0.57	0.59
query22	2.44	1.70	1.21
query23	17.10	1.07	0.86
query24	2.90	0.28	1.59
query25	0.25	0.17	0.14
query26	0.30	0.14	0.13
query27	0.03	0.04	0.04
query28	11.00	1.08	1.08
query29	12.54	3.18	3.21
query30	0.25	0.07	0.06
query31	2.87	0.38	0.36
query32	3.28	0.48	0.46
query33	3.09	3.02	3.04
query34	17.03	4.44	4.48
query35	4.48	4.48	4.50
query36	0.67	0.49	0.47
query37	0.08	0.06	0.06
query38	0.05	0.03	0.04
query39	0.03	0.02	0.02
query40	0.16	0.12	0.13
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.85 s
Total hot run time: 31.22 s

@eldenmoon eldenmoon force-pushed the fix-variant-nested-explode branch from a2a985c to fee282a Compare November 25, 2024 14:37
@eldenmoon
Copy link
Member Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39658 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fee282a1bc0b52ec69d0cec40548f4dc6153dca0, data reload: false

------ Round 1 ----------------------------------
q1	17614	7434	7255	7255
q2	2042	181	171	171
q3	10794	1051	1166	1051
q4	10574	717	780	717
q5	7613	2690	2744	2690
q6	243	150	148	148
q7	984	629	601	601
q8	9230	1855	1866	1855
q9	6516	6383	6361	6361
q10	6979	2295	2301	2295
q11	465	251	262	251
q12	417	224	215	215
q13	17759	2980	3019	2980
q14	241	212	216	212
q15	573	511	525	511
q16	665	580	599	580
q17	965	501	552	501
q18	7198	6668	6810	6668
q19	1328	969	994	969
q20	459	188	180	180
q21	3931	3131	3187	3131
q22	381	316	317	316
Total cold run time: 106971 ms
Total hot run time: 39658 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7220	7220	7263	7220
q2	324	229	220	220
q3	2879	2787	2855	2787
q4	2032	1740	1745	1740
q5	5646	5735	5591	5591
q6	220	138	141	138
q7	2247	1758	1771	1758
q8	3344	3527	3461	3461
q9	8923	8898	8802	8802
q10	3612	3527	3525	3525
q11	592	500	509	500
q12	818	610	672	610
q13	11439	3241	3260	3241
q14	302	260	267	260
q15	571	525	526	525
q16	687	634	665	634
q17	1847	1639	1613	1613
q18	8203	7772	7840	7772
q19	1682	1608	1451	1451
q20	2113	1862	1862	1862
q21	5548	5437	5405	5405
q22	639	592	589	589
Total cold run time: 70888 ms
Total hot run time: 59704 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.34% (9980/26033)
Line Coverage: 29.44% (83534/283788)
Region Coverage: 28.61% (42998/150303)
Branch Coverage: 25.18% (21834/86724)
Coverage Report: http://coverage.selectdb-in.cc/coverage/fee282a1bc0b52ec69d0cec40548f4dc6153dca0_fee282a1bc0b52ec69d0cec40548f4dc6153dca0/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 196719 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit fee282a1bc0b52ec69d0cec40548f4dc6153dca0, data reload: false

query1	1282	950	914	914
query2	6243	2124	2123	2123
query3	10815	3983	4026	3983
query4	67696	29015	23685	23685
query5	4960	437	437	437
query6	404	175	169	169
query7	5486	290	280	280
query8	302	224	212	212
query9	8397	2708	2696	2696
query10	409	254	234	234
query11	16941	15261	15798	15261
query12	165	105	110	105
query13	1524	443	431	431
query14	10113	7035	7534	7035
query15	212	186	182	182
query16	7179	476	523	476
query17	1101	607	561	561
query18	1845	309	288	288
query19	213	156	149	149
query20	128	114	111	111
query21	203	106	105	105
query22	4778	4750	4696	4696
query23	34692	34653	34223	34223
query24	5411	2535	2446	2446
query25	489	401	400	400
query26	644	147	155	147
query27	1832	312	285	285
query28	4585	2519	2508	2508
query29	663	415	428	415
query30	214	150	146	146
query31	1000	801	845	801
query32	66	60	59	59
query33	447	294	291	291
query34	933	521	521	521
query35	852	721	725	721
query36	1077	943	983	943
query37	126	77	75	75
query38	4300	4336	4454	4336
query39	1509	1481	1476	1476
query40	202	95	96	95
query41	48	45	48	45
query42	109	99	97	97
query43	574	511	506	506
query44	1211	861	854	854
query45	180	180	164	164
query46	1132	708	717	708
query47	2056	1922	1940	1922
query48	428	325	344	325
query49	716	401	408	401
query50	838	392	404	392
query51	7425	7200	7073	7073
query52	97	87	88	87
query53	256	179	178	178
query54	503	390	387	387
query55	79	74	72	72
query56	249	242	230	230
query57	1315	1149	1128	1128
query58	215	225	221	221
query59	3285	3106	3099	3099
query60	295	258	269	258
query61	136	131	133	131
query62	791	670	665	665
query63	209	186	198	186
query64	1468	717	650	650
query65	3281	3203	3221	3203
query66	704	297	292	292
query67	16180	15601	15759	15601
query68	4008	572	563	563
query69	426	252	258	252
query70	1186	1126	1113	1113
query71	351	249	261	249
query72	6344	4058	3962	3962
query73	747	354	349	349
query74	10176	8913	9159	8913
query75	3386	2674	2676	2674
query76	2004	1015	1095	1015
query77	481	276	264	264
query78	10650	9425	9435	9425
query79	1698	585	594	585
query80	883	423	433	423
query81	506	235	219	219
query82	1275	118	115	115
query83	276	148	144	144
query84	281	84	74	74
query85	922	314	300	300
query86	348	309	301	301
query87	4690	4599	4522	4522
query88	3726	2234	2192	2192
query89	416	295	299	295
query90	2069	182	191	182
query91	131	103	105	103
query92	65	52	49	49
query93	1977	547	550	547
query94	789	300	295	295
query95	370	268	247	247
query96	617	295	278	278
query97	2802	2679	2689	2679
query98	228	208	207	207
query99	1595	1287	1290	1287
Total cold run time: 318641 ms
Total hot run time: 196719 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.44 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit fee282a1bc0b52ec69d0cec40548f4dc6153dca0, data reload: false

query1	0.03	0.03	0.02
query2	0.06	0.03	0.03
query3	0.23	0.07	0.07
query4	1.61	0.10	0.10
query5	0.42	0.41	0.41
query6	1.13	0.66	0.65
query7	0.02	0.02	0.01
query8	0.04	0.04	0.03
query9	0.58	0.51	0.52
query10	0.55	0.54	0.56
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.61	0.61	0.61
query14	2.70	2.79	2.80
query15	0.92	0.83	0.82
query16	0.39	0.39	0.43
query17	1.06	1.03	1.07
query18	0.21	0.21	0.21
query19	1.86	1.77	2.03
query20	0.02	0.01	0.01
query21	15.39	0.61	0.60
query22	2.84	2.75	1.93
query23	17.04	0.88	0.78
query24	2.93	0.76	2.05
query25	0.36	0.19	0.04
query26	0.48	0.14	0.14
query27	0.06	0.04	0.05
query28	10.37	1.10	1.08
query29	12.55	3.24	3.26
query30	0.25	0.07	0.06
query31	2.85	0.38	0.38
query32	3.29	0.46	0.47
query33	3.05	3.02	3.00
query34	17.03	4.43	4.46
query35	4.47	4.50	4.53
query36	0.66	0.49	0.48
query37	0.09	0.06	0.06
query38	0.05	0.03	0.03
query39	0.04	0.03	0.02
query40	0.18	0.13	0.13
query41	0.07	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.83 s
Total hot run time: 32.44 s

// 1. array_contains
qt_sql "select * from var_nested_array_agg where array_contains(cast(v['nested']['xx'] as array<int>), 10) order by k limit 10"
// 2. array_agg scalar
sql "select k, array_agg(cast(v['nested'] as text)) from var_nested_array_agg group by k limit 10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qt_sql ? just sql will not show the output in out file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the output is not stable since the serailized v['nested'] may contains blank and is not stable, which is acceptable at present

_array_column = ColumnNullable::create(ColumnArray::create(ColumnNothing::create(0)),
ColumnUInt8::create(0));
_array_column->assume_mutable()->insert_many_defaults(variant_column->size());
_detail.nested_type = std::make_shared<DataTypeNothing>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe here should wrap in nullable , because many calculate operator behavior just make nested column in array as nullable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's redundant in column object case

Copy link
Contributor

@amorynan amorynan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 2, 2024
Copy link
Contributor

github-actions bot commented Dec 2, 2024

PR approved by at least one committer and no changes requested.

@eldenmoon eldenmoon merged commit 1abfd10 into apache:master Dec 2, 2024
30 of 37 checks passed
@@ -37,6 +42,34 @@ VExplodeTableFunction::VExplodeTableFunction() {
_fn_name = "vexplode";
}

Status VExplodeTableFunction::_process_init_variant(Block* block, int value_column_idx) {
// explode variant array
const auto& variant_column = check_and_get_column<ColumnObject>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's bug which will lead to coredump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants