Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](hms table)Some optimizations for hms external table for 3.0 (#44909) #46086

Merged
merged 2 commits into from
Dec 27, 2024

Conversation

wuwenchi
Copy link
Contributor

bp: #44909

Problem Summary:

1. Increase the schema cache to reduce the time to obtain the schema.
2. `HoodieTableMetaClient` is stored in `HMSExternalTable` to prevent
redundant creation.
3. Cache HoodieTableFileSystemView to speed up getting FileGroup or
FileSlice.
4. Fix analyze path for `file:/abc`.
5. Add `FSDataInputStreamWrapper` to solve hudi conflict class.
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wuwenchi
Copy link
Contributor Author

run buildall

@wuwenchi wuwenchi changed the title [opt](hms table)Some optimizations for hms external table (#44909) [opt](hms table)Some optimizations for hms external table for 3.0 (#44909) Dec 27, 2024
@wuwenchi
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40719 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b97c4dcaf03898c04c1b3f90e5ca6b2e718fda9d, data reload: false

------ Round 1 ----------------------------------
q1	17620	7357	7224	7224
q2	2051	166	168	166
q3	10574	1112	1193	1112
q4	10229	740	745	740
q5	7730	2844	2844	2844
q6	234	147	146	146
q7	955	613	601	601
q8	9366	1968	1978	1968
q9	6639	6399	6399	6399
q10	6952	2241	2314	2241
q11	451	265	270	265
q12	399	217	206	206
q13	17802	2966	3001	2966
q14	245	210	221	210
q15	558	522	513	513
q16	702	622	604	604
q17	962	611	520	520
q18	7317	6552	6736	6552
q19	1375	1093	1046	1046
q20	464	207	202	202
q21	4034	3296	3196	3196
q22	1137	998	1019	998
Total cold run time: 107796 ms
Total hot run time: 40719 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7219	7231	7190	7190
q2	334	242	228	228
q3	2918	2973	2914	2914
q4	2091	1792	1863	1792
q5	5721	5697	5719	5697
q6	218	136	137	136
q7	2141	1794	1825	1794
q8	3366	3496	3458	3458
q9	8825	8839	8826	8826
q10	3608	3555	3567	3555
q11	580	505	509	505
q12	779	610	605	605
q13	9083	3136	3165	3136
q14	305	271	285	271
q15	567	523	515	515
q16	688	656	653	653
q17	1789	1599	1559	1559
q18	7790	7355	7493	7355
q19	1648	1493	1459	1459
q20	2040	1779	1819	1779
q21	5403	5041	5264	5041
q22	1110	1026	998	998
Total cold run time: 68223 ms
Total hot run time: 59466 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190976 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b97c4dcaf03898c04c1b3f90e5ca6b2e718fda9d, data reload: false

query1	962	385	363	363
query2	6507	2184	2129	2129
query3	6705	211	222	211
query4	33975	23459	23401	23401
query5	4279	439	430	430
query6	255	165	175	165
query7	4624	307	318	307
query8	287	234	228	228
query9	9718	2699	2685	2685
query10	475	258	253	253
query11	17985	15044	15095	15044
query12	155	100	102	100
query13	1645	445	407	407
query14	9547	6524	7379	6524
query15	249	172	175	172
query16	8166	487	442	442
query17	1673	565	557	557
query18	2168	317	328	317
query19	375	161	155	155
query20	115	110	112	110
query21	60	46	47	46
query22	4642	4148	4406	4148
query23	35144	34178	33956	33956
query24	11213	2882	2818	2818
query25	682	394	405	394
query26	1550	171	171	171
query27	2804	303	302	302
query28	8140	2464	2460	2460
query29	974	444	435	435
query30	335	167	172	167
query31	1010	807	818	807
query32	93	60	61	60
query33	793	286	289	286
query34	968	485	529	485
query35	849	731	737	731
query36	1096	939	966	939
query37	139	80	77	77
query38	4013	3932	3998	3932
query39	1461	1451	1435	1435
query40	224	87	85	85
query41	55	51	50	50
query42	110	99	102	99
query43	536	503	504	503
query44	1233	823	810	810
query45	187	173	169	169
query46	1123	730	730	730
query47	1953	1847	1851	1847
query48	460	374	379	374
query49	1181	408	399	399
query50	805	404	425	404
query51	7163	7220	7023	7023
query52	101	84	89	84
query53	260	181	180	180
query54	1249	448	440	440
query55	82	79	75	75
query56	255	241	254	241
query57	1250	1120	1150	1120
query58	229	202	214	202
query59	3243	2998	3038	2998
query60	302	258	244	244
query61	108	123	108	108
query62	849	678	656	656
query63	218	187	185	185
query64	5404	647	637	637
query65	3351	3269	3246	3246
query66	1290	305	340	305
query67	16179	15688	15544	15544
query68	4986	570	564	564
query69	419	262	279	262
query70	1193	1118	1136	1118
query71	393	259	255	255
query72	6511	4054	3976	3976
query73	750	349	367	349
query74	10090	9084	9014	9014
query75	3391	2606	2663	2606
query76	2997	1074	1032	1032
query77	408	275	276	275
query78	10405	9571	9633	9571
query79	1116	602	589	589
query80	749	441	425	425
query81	517	248	237	237
query82	1256	118	121	118
query83	240	146	151	146
query84	245	80	82	80
query85	1020	293	289	289
query86	310	300	300	300
query87	4513	4356	4284	4284
query88	3506	2384	2338	2338
query89	406	288	287	287
query90	1962	182	183	182
query91	209	152	151	151
query92	58	51	56	51
query93	1023	547	536	536
query94	717	293	297	293
query95	351	256	250	250
query96	599	283	286	283
query97	3343	3213	3223	3213
query98	217	203	196	196
query99	1541	1324	1299	1299
Total cold run time: 300672 ms
Total hot run time: 190976 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.69 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b97c4dcaf03898c04c1b3f90e5ca6b2e718fda9d, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.02
query3	0.23	0.07	0.07
query4	1.61	0.10	0.10
query5	0.51	0.51	0.50
query6	1.14	0.72	0.72
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.56	0.49	0.50
query10	0.54	0.55	0.57
query11	0.14	0.10	0.11
query12	0.14	0.11	0.11
query13	0.62	0.60	0.59
query14	2.92	2.90	2.93
query15	0.89	0.82	0.82
query16	0.38	0.37	0.38
query17	1.00	1.07	1.05
query18	0.24	0.23	0.21
query19	1.90	1.86	2.05
query20	0.01	0.01	0.01
query21	15.36	0.58	0.61
query22	2.80	2.80	1.50
query23	16.86	1.00	0.87
query24	3.13	1.45	0.96
query25	0.38	0.10	0.15
query26	0.33	0.14	0.14
query27	0.03	0.04	0.04
query28	10.34	1.11	1.07
query29	12.57	3.20	3.25
query30	0.25	0.05	0.06
query31	2.86	0.39	0.37
query32	3.28	0.46	0.47
query33	2.94	3.02	3.02
query34	16.96	4.49	4.52
query35	4.61	4.55	4.49
query36	0.68	0.51	0.50
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.13	0.13
query41	0.07	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.04	0.03
Total cold run time: 106.84 s
Total hot run time: 32.69 s

@morningman morningman merged commit 01dc5bc into apache:branch-3.0 Dec 27, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants