Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enchement](utf8)import enable_text_validate_utf8 session var (#45537) #46071

Merged
merged 1 commit into from
Dec 27, 2024

Conversation

hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Dec 27, 2024

bp #45537
Problem Summary:
When reading text format files in Hive catalog and TVF, sometimes you may encounter the exception Only support csv data in utf8 codec. I introduced a new session variable enable_text_validate_utf8 to control whether to check the utf8 format.

Release note

Introduced enable_text_validate_utf8 session variable to control whether to check the utf8 format.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…#45537)

Problem Summary:
When reading text format files in Hive catalog and TVF, sometimes you
may encounter the exception `Only support csv data in utf8 codec`.
I introduced a new session variable `enable_text_validate_utf8` to
control whether to check the utf8 format.

Introduced `enable_text_validate_utf8` session variable to control
whether to check the utf8 format.
@Thearas
Copy link
Contributor

Thearas commented Dec 27, 2024

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41185 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6596ceaef0e2e348e8397f4d241168dfba1520c0, data reload: false

------ Round 1 ----------------------------------
q1	17559	7513	7344	7344
q2	2047	193	173	173
q3	10649	1098	1132	1098
q4	10582	804	737	737
q5	7761	2904	2917	2904
q6	245	147	147	147
q7	986	616	605	605
q8	9350	2029	2071	2029
q9	6606	6498	6455	6455
q10	6998	2357	2323	2323
q11	471	260	261	260
q12	399	219	218	218
q13	17775	2958	2992	2958
q14	238	206	207	206
q15	571	516	525	516
q16	704	618	603	603
q17	992	588	635	588
q18	7429	6659	6697	6659
q19	1391	1033	1047	1033
q20	464	197	199	197
q21	3981	3226	3123	3123
q22	1120	1009	1026	1009
Total cold run time: 108318 ms
Total hot run time: 41185 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7293	7295	7298	7295
q2	325	236	231	231
q3	3000	2947	2969	2947
q4	2102	1894	1825	1825
q5	5731	5788	5800	5788
q6	227	146	145	145
q7	2256	1846	1807	1807
q8	3428	3576	3468	3468
q9	8923	8916	8930	8916
q10	3602	3584	3563	3563
q11	604	503	502	502
q12	835	657	611	611
q13	10502	3161	3138	3138
q14	288	268	266	266
q15	584	544	526	526
q16	701	648	669	648
q17	1901	1621	1609	1609
q18	8131	7685	7709	7685
q19	1711	1610	1584	1584
q20	2100	1879	1907	1879
q21	5602	5485	5331	5331
q22	1155	1013	1079	1013
Total cold run time: 71001 ms
Total hot run time: 60777 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197984 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6596ceaef0e2e348e8397f4d241168dfba1520c0, data reload: false

query1	1315	948	909	909
query2	6251	2162	2081	2081
query3	10967	4499	4276	4276
query4	67469	29247	23412	23412
query5	5122	449	466	449
query6	413	192	190	190
query7	5581	306	305	305
query8	318	245	226	226
query9	9081	2699	2698	2698
query10	474	272	259	259
query11	17275	15384	15756	15384
query12	164	105	101	101
query13	1538	450	433	433
query14	10512	7097	7417	7097
query15	199	176	191	176
query16	7199	490	488	488
query17	1043	600	643	600
query18	1781	334	315	315
query19	210	163	162	162
query20	118	110	111	110
query21	61	48	47	47
query22	4740	4762	4672	4672
query23	34929	34874	34328	34328
query24	6212	2930	2916	2916
query25	533	422	415	415
query26	664	177	176	176
query27	1816	311	316	311
query28	4238	2536	2468	2468
query29	723	486	454	454
query30	276	161	159	159
query31	994	835	869	835
query32	68	58	53	53
query33	406	292	295	292
query34	887	513	511	511
query35	836	713	738	713
query36	1112	969	961	961
query37	121	75	72	72
query38	4063	4074	4025	4025
query39	1511	1495	1465	1465
query40	138	83	80	80
query41	52	55	48	48
query42	111	96	93	93
query43	536	500	503	500
query44	1207	848	843	843
query45	186	166	167	166
query46	1153	746	726	726
query47	2014	1920	1972	1920
query48	476	368	391	368
query49	749	395	384	384
query50	812	434	427	427
query51	7381	7402	7219	7219
query52	95	86	85	85
query53	247	175	181	175
query54	551	464	444	444
query55	78	73	73	73
query56	260	232	247	232
query57	1217	1132	1117	1117
query58	207	208	227	208
query59	3385	2965	2948	2948
query60	279	245	244	244
query61	109	111	113	111
query62	767	672	657	657
query63	217	192	190	190
query64	1418	659	624	624
query65	3279	3193	3173	3173
query66	705	303	304	303
query67	16021	15704	15629	15629
query68	4148	567	563	563
query69	421	268	263	263
query70	1171	1127	1035	1035
query71	350	250	253	250
query72	6319	4097	4049	4049
query73	748	354	346	346
query74	9402	9044	8976	8976
query75	3356	2636	2679	2636
query76	1859	1112	1088	1088
query77	530	264	279	264
query78	10590	9570	9540	9540
query79	1258	611	603	603
query80	807	422	431	422
query81	528	241	239	239
query82	1305	121	113	113
query83	235	143	141	141
query84	285	82	77	77
query85	867	312	288	288
query86	340	298	302	298
query87	4363	4235	4409	4235
query88	3470	2399	2353	2353
query89	407	287	297	287
query90	1986	188	187	187
query91	172	145	147	145
query92	66	52	49	49
query93	1259	540	544	540
query94	774	284	297	284
query95	343	250	257	250
query96	611	280	280	280
query97	3314	3163	3160	3160
query98	210	211	194	194
query99	1912	1335	1298	1298
Total cold run time: 318276 ms
Total hot run time: 197984 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.79 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6596ceaef0e2e348e8397f4d241168dfba1520c0, data reload: false

query1	0.04	0.03	0.02
query2	0.08	0.05	0.04
query3	0.23	0.06	0.06
query4	1.64	0.08	0.08
query5	0.51	0.50	0.51
query6	1.14	0.75	0.75
query7	0.03	0.01	0.02
query8	0.06	0.05	0.05
query9	0.56	0.51	0.49
query10	0.55	0.56	0.56
query11	0.18	0.12	0.13
query12	0.16	0.13	0.12
query13	0.62	0.60	0.59
query14	3.03	2.97	2.96
query15	0.91	0.82	0.85
query16	0.39	0.39	0.38
query17	1.09	1.06	0.98
query18	0.20	0.19	0.19
query19	1.93	1.93	1.96
query20	0.02	0.02	0.02
query21	15.36	0.68	0.67
query22	4.53	6.74	1.98
query23	18.32	1.33	1.33
query24	2.18	0.23	0.23
query25	0.15	0.08	0.09
query26	0.26	0.18	0.18
query27	0.08	0.08	0.09
query28	13.32	1.16	1.14
query29	12.65	3.35	3.33
query30	0.25	0.07	0.06
query31	2.87	0.41	0.40
query32	3.22	0.49	0.50
query33	3.00	2.99	3.03
query34	16.92	4.56	4.55
query35	4.58	4.63	4.52
query36	0.66	0.48	0.49
query37	0.20	0.17	0.17
query38	0.16	0.15	0.16
query39	0.05	0.04	0.04
query40	0.16	0.12	0.12
query41	0.09	0.05	0.05
query42	0.06	0.05	0.04
query43	0.05	0.05	0.04
Total cold run time: 112.49 s
Total hot run time: 33.79 s

@morningman morningman merged commit bbef3ec into apache:branch-3.0 Dec 27, 2024
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants