Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](cloud-mow) Fix sending commiting rpc to FE twice problem #41395

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

hust-hhb
Copy link
Contributor

@hust-hhb hust-hhb commented Sep 27, 2024

Here is an expample while commit rpc will send twice:

  1. first commit request try to get delete bitmap lock, there is 2 lock(fe and ms), which take over rpc timeout(60s default) but not send DELETE_BITMAP_LOCK_ERR to be, and fe will continue to send calculate delete bitmap task to be
  2. be calculate delete bitmap success and remove delete bitmap cache
  3. because step 1 take over 60s, be will resend commit rpc to fe
  4. after first commit request done, the second commit request from step 3 will do the same thing, but delete bitmap cache has been delete by first commit, so it will fail on be
  5. client will see commit fail

this pr check transaction status before sending delete bitmap task to be, if transaction status is committed or visible, it no need to recalculate delete bitmap again, just retrun rpc success to be.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@hust-hhb
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.29% (9626/25811)
Line Coverage: 28.71% (79692/277590)
Region Coverage: 28.13% (41190/146435)
Branch Coverage: 24.75% (20975/84748)
Coverage Report: http://coverage.selectdb-in.cc/coverage/17be146133af7bb10b24868df990f84c10681820_17be146133af7bb10b24868df990f84c10681820/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 42419 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 17be146133af7bb10b24868df990f84c10681820, data reload: false

------ Round 1 ----------------------------------
q1	17577	7920	7830	7830
q2	2014	279	267	267
q3	12052	1126	1245	1126
q4	10559	775	762	762
q5	7778	3035	2960	2960
q6	263	153	150	150
q7	1059	626	615	615
q8	9381	2010	2023	2010
q9	6964	6793	6736	6736
q10	7015	2332	2354	2332
q11	448	251	247	247
q12	470	220	213	213
q13	17769	2990	2984	2984
q14	254	210	209	209
q15	564	518	534	518
q16	657	578	573	573
q17	1006	544	598	544
q18	7356	6691	6695	6691
q19	1404	1195	1147	1147
q20	506	211	202	202
q21	4218	3470	3315	3315
q22	1111	1008	988	988
Total cold run time: 110425 ms
Total hot run time: 42419 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7776	7711	7803	7711
q2	358	236	249	236
q3	3193	3074	3077	3074
q4	2183	1851	1800	1800
q5	5802	5815	5870	5815
q6	261	151	144	144
q7	2311	1825	1820	1820
q8	3616	3722	3767	3722
q9	8990	9061	8983	8983
q10	3701	3666	3650	3650
q11	611	502	497	497
q12	869	628	585	585
q13	11415	3183	3159	3159
q14	307	268	269	268
q15	601	514	525	514
q16	722	667	648	648
q17	2012	1756	1703	1703
q18	8325	7695	7463	7463
q19	1851	1691	1714	1691
q20	2155	1887	1898	1887
q21	5785	5549	5616	5549
q22	1144	1069	1027	1027
Total cold run time: 73988 ms
Total hot run time: 61946 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192131 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 17be146133af7bb10b24868df990f84c10681820, data reload: false

query1	944	393	396	393
query2	6270	2122	2052	2052
query3	8687	192	204	192
query4	33631	23639	23461	23461
query5	3521	469	460	460
query6	264	168	157	157
query7	4191	306	300	300
query8	282	226	215	215
query9	9331	2662	2647	2647
query10	470	282	289	282
query11	17809	15145	15326	15145
query12	154	101	97	97
query13	1532	438	415	415
query14	9383	7482	7486	7482
query15	259	170	182	170
query16	7442	447	470	447
query17	1695	616	593	593
query18	1783	342	312	312
query19	368	180	152	152
query20	122	112	111	111
query21	214	108	109	108
query22	4861	4696	4429	4429
query23	35741	34135	34088	34088
query24	11020	2864	2836	2836
query25	607	414	404	404
query26	1169	162	160	160
query27	2376	304	295	295
query28	7366	2441	2389	2389
query29	776	439	441	439
query30	288	157	156	156
query31	1039	812	806	806
query32	101	60	59	59
query33	753	306	308	306
query34	934	509	504	504
query35	890	732	724	724
query36	1127	956	973	956
query37	154	93	93	93
query38	3970	4013	3869	3869
query39	1497	1426	1411	1411
query40	208	101	102	101
query41	53	48	48	48
query42	113	95	99	95
query43	536	481	505	481
query44	1266	794	795	794
query45	199	166	173	166
query46	1135	702	707	702
query47	1902	1827	1800	1800
query48	468	375	379	375
query49	910	413	416	413
query50	847	411	410	410
query51	7062	6902	6929	6902
query52	105	92	87	87
query53	251	186	192	186
query54	1055	467	474	467
query55	79	73	78	73
query56	299	268	270	268
query57	1203	1098	1111	1098
query58	245	242	242	242
query59	3392	3072	3045	3045
query60	297	266	269	266
query61	108	107	103	103
query62	836	681	675	675
query63	227	193	185	185
query64	4025	658	631	631
query65	3299	3195	3230	3195
query66	732	312	308	308
query67	15910	15561	15383	15383
query68	5062	586	559	559
query69	545	298	300	298
query70	1180	1124	1127	1124
query71	402	297	271	271
query72	7479	4017	3981	3981
query73	789	345	345	345
query74	10131	8921	9070	8921
query75	3635	2703	2724	2703
query76	3339	867	901	867
query77	641	301	301	301
query78	10547	9705	9607	9607
query79	4451	578	600	578
query80	2090	451	448	448
query81	601	241	240	240
query82	845	140	140	140
query83	312	158	140	140
query84	275	74	80	74
query85	2133	301	286	286
query86	454	307	301	301
query87	4434	4341	4326	4326
query88	4122	2348	2334	2334
query89	421	303	290	290
query90	2026	188	186	186
query91	183	139	142	139
query92	62	47	48	47
query93	3660	562	538	538
query94	918	287	286	286
query95	355	260	259	259
query96	631	278	286	278
query97	3257	3168	3154	3154
query98	219	197	203	197
query99	1730	1299	1332	1299
Total cold run time: 306712 ms
Total hot run time: 192131 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.47 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 17be146133af7bb10b24868df990f84c10681820, data reload: false

query1	0.05	0.05	0.05
query2	0.06	0.02	0.03
query3	0.23	0.06	0.07
query4	1.65	0.10	0.10
query5	0.50	0.51	0.52
query6	1.13	0.73	0.72
query7	0.01	0.02	0.01
query8	0.04	0.03	0.03
query9	0.56	0.49	0.48
query10	0.54	0.57	0.53
query11	0.14	0.11	0.11
query12	0.14	0.11	0.11
query13	0.62	0.59	0.59
query14	2.72	2.70	2.73
query15	0.90	0.83	0.82
query16	0.37	0.38	0.38
query17	1.06	1.03	1.05
query18	0.20	0.20	0.20
query19	1.93	1.88	1.96
query20	0.01	0.01	0.01
query21	15.36	0.59	0.57
query22	3.02	3.03	2.56
query23	16.97	1.00	0.90
query24	2.83	0.95	0.96
query25	0.15	0.19	0.16
query26	0.40	0.16	0.15
query27	0.04	0.03	0.04
query28	10.78	1.10	1.08
query29	12.57	3.19	3.21
query30	0.25	0.06	0.05
query31	2.88	0.39	0.38
query32	3.27	0.48	0.48
query33	2.98	3.03	2.99
query34	17.07	4.39	4.47
query35	4.52	4.48	4.50
query36	0.66	0.48	0.49
query37	0.09	0.06	0.05
query38	0.04	0.03	0.04
query39	0.04	0.02	0.02
query40	0.16	0.13	0.12
query41	0.07	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 107.08 s
Total hot run time: 33.47 s

@hust-hhb
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.25% (9646/25894)
Line Coverage: 28.55% (79972/280114)
Region Coverage: 27.99% (41346/147738)
Branch Coverage: 24.60% (21051/85588)
Coverage Report: http://coverage.selectdb-in.cc/coverage/17be146133af7bb10b24868df990f84c10681820_17be146133af7bb10b24868df990f84c10681820/report/index.html

@hust-hhb
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.40% (9660/25827)
Line Coverage: 28.69% (80174/279478)
Region Coverage: 28.10% (41440/147449)
Branch Coverage: 24.71% (21108/85416)
Coverage Report: http://coverage.selectdb-in.cc/coverage/17be146133af7bb10b24868df990f84c10681820_17be146133af7bb10b24868df990f84c10681820/report/index.html

@hust-hhb
Copy link
Contributor Author

hust-hhb commented Nov 7, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41651 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 51e308659e8cdd89cb05370ee128f34c1a754bb6, data reload: false

------ Round 1 ----------------------------------
q1	17608	7535	7368	7368
q2	2099	190	177	177
q3	10569	1152	1205	1152
q4	10238	922	820	820
q5	7750	3161	3099	3099
q6	235	147	147	147
q7	1024	624	603	603
q8	9382	2055	2086	2055
q9	6631	6479	6472	6472
q10	7093	2406	2452	2406
q11	471	260	261	260
q12	404	218	219	218
q13	17779	3053	2998	2998
q14	233	215	217	215
q15	577	537	509	509
q16	636	578	597	578
q17	991	477	659	477
q18	7510	6692	6757	6692
q19	1350	1052	993	993
q20	461	187	188	187
q21	4067	3300	3229	3229
q22	1093	996	1006	996
Total cold run time: 108201 ms
Total hot run time: 41651 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7260	7209	7228	7209
q2	346	260	244	244
q3	2924	2828	2811	2811
q4	1952	1711	1694	1694
q5	5463	5492	5563	5492
q6	216	139	137	137
q7	2147	1717	1703	1703
q8	3288	3475	3466	3466
q9	8626	8628	8626	8626
q10	3512	3461	3456	3456
q11	596	514	497	497
q12	805	560	564	560
q13	9707	3040	2985	2985
q14	303	255	257	255
q15	586	541	554	541
q16	689	660	622	622
q17	1841	1611	1599	1599
q18	7848	7563	7350	7350
q19	1676	1557	1461	1461
q20	2041	1834	1813	1813
q21	5481	5282	5233	5233
q22	1149	1016	995	995
Total cold run time: 68456 ms
Total hot run time: 58749 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193293 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 51e308659e8cdd89cb05370ee128f34c1a754bb6, data reload: false

query1	972	374	372	372
query2	6516	2096	2007	2007
query3	6789	212	222	212
query4	33839	23815	23668	23668
query5	4368	437	434	434
query6	272	177	182	177
query7	4593	291	290	290
query8	280	232	224	224
query9	9330	2671	2675	2671
query10	485	250	251	250
query11	18217	15236	15672	15236
query12	157	105	94	94
query13	1658	417	404	404
query14	10559	7130	6941	6941
query15	253	176	175	175
query16	8097	479	457	457
query17	1653	572	545	545
query18	2455	604	615	604
query19	402	183	180	180
query20	115	109	112	109
query21	209	102	101	101
query22	4639	4236	4405	4236
query23	34922	34244	34321	34244
query24	11615	3341	3335	3335
query25	676	412	407	407
query26	1412	184	180	180
query27	2768	279	280	279
query28	8211	2424	2406	2406
query29	882	431	432	431
query30	468	313	320	313
query31	1020	799	814	799
query32	93	58	62	58
query33	775	283	273	273
query34	983	518	522	518
query35	917	756	736	736
query36	1124	938	962	938
query37	140	80	79	79
query38	4504	4166	4402	4166
query39	1477	1414	1453	1414
query40	289	101	102	101
query41	52	48	45	45
query42	111	102	103	102
query43	534	503	494	494
query44	1357	830	812	812
query45	184	168	172	168
query46	1164	702	697	697
query47	1933	1834	1867	1834
query48	432	325	322	322
query49	1165	430	410	410
query50	801	398	393	393
query51	7437	7108	7112	7108
query52	100	92	89	89
query53	259	188	184	184
query54	1312	430	426	426
query55	82	80	80	80
query56	254	256	240	240
query57	1291	1226	1166	1166
query58	238	200	228	200
query59	3182	3226	3182	3182
query60	275	255	252	252
query61	115	112	110	110
query62	907	699	666	666
query63	212	202	186	186
query64	5706	655	623	623
query65	3341	3217	3227	3217
query66	1456	305	342	305
query67	16195	15932	15726	15726
query68	5046	598	577	577
query69	431	257	251	251
query70	1205	1143	1102	1102
query71	409	253	251	251
query72	6733	4051	4022	4022
query73	771	366	359	359
query74	10287	9059	9045	9045
query75	3454	2689	2690	2689
query76	2881	1057	1031	1031
query77	400	285	287	285
query78	10242	9404	9618	9404
query79	1658	598	608	598
query80	1091	429	430	429
query81	550	236	243	236
query82	863	121	115	115
query83	231	162	168	162
query84	246	75	75	75
query85	1313	323	297	297
query86	393	299	310	299
query87	4943	4785	4617	4617
query88	3454	2249	2192	2192
query89	393	293	298	293
query90	1991	189	183	183
query91	135	104	102	102
query92	60	48	48	48
query93	1345	556	546	546
query94	925	296	292	292
query95	345	252	239	239
query96	614	289	284	284
query97	2847	2671	2706	2671
query98	208	204	198	198
query99	1569	1300	1302	1300
Total cold run time: 305262 ms
Total hot run time: 193293 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.44 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 51e308659e8cdd89cb05370ee128f34c1a754bb6, data reload: false

query1	0.03	0.04	0.03
query2	0.06	0.03	0.03
query3	0.22	0.07	0.07
query4	1.63	0.09	0.11
query5	0.41	0.40	0.41
query6	1.17	0.66	0.64
query7	0.04	0.01	0.01
query8	0.04	0.03	0.03
query9	0.57	0.51	0.50
query10	0.54	0.55	0.55
query11	0.14	0.11	0.11
query12	0.14	0.11	0.11
query13	0.60	0.59	0.59
query14	2.80	2.77	2.84
query15	0.90	0.83	0.83
query16	0.39	0.36	0.37
query17	0.99	0.95	1.05
query18	0.20	0.20	0.20
query19	2.00	1.89	2.01
query20	0.01	0.01	0.01
query21	15.35	0.59	0.58
query22	2.58	2.10	2.41
query23	17.00	0.93	0.69
query24	3.30	1.84	2.41
query25	0.24	0.12	0.04
query26	0.60	0.12	0.13
query27	0.04	0.04	0.04
query28	8.74	1.10	1.06
query29	12.55	3.28	3.20
query30	0.25	0.06	0.05
query31	2.87	0.39	0.38
query32	3.27	0.46	0.47
query33	2.98	3.03	3.12
query34	16.99	4.41	4.50
query35	4.49	4.51	4.50
query36	0.70	0.48	0.48
query37	0.08	0.06	0.05
query38	0.05	0.03	0.04
query39	0.04	0.02	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.04
Total cold run time: 105.31 s
Total hot run time: 33.44 s

zhannngchen
zhannngchen previously approved these changes Nov 8, 2024
Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Nov 8, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 8, 2024
Copy link
Contributor

github-actions bot commented Nov 8, 2024

PR approved by anyone and no changes requested.

@hust-hhb
Copy link
Contributor Author

hust-hhb commented Nov 8, 2024

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 8, 2024
@@ -475,6 +475,10 @@ private void commitTransaction(long dbId, List<Table> tableList, long transactio

List<OlapTable> mowTableList = getMowTableList(tableList, tabletCommitInfos);
if (!mowTableList.isEmpty()) {
// may be this txn has been calculated by previously task but commit rpc is timeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use BE rather than "be", it's confused with the english word "be"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhannngchen here it means may be, not BE

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 11, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhannngchen
Copy link
Contributor

run performance

@dataroaring dataroaring merged commit 7b2547c into apache:master Nov 13, 2024
30 of 35 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 13, 2024
Here is an expample while commit rpc will send twice:
1. first commit request try to get delete bitmap lock, there is 2
lock(fe and ms), which take over rpc timeout(60s default) but not send
DELETE_BITMAP_LOCK_ERR to be, and fe will continue to send calculate
delete bitmap task to be
2. be calculate delete bitmap success and remove delete bitmap cache
3. because step 1 take over 60s, be will resend commit rpc to fe
4. after first commit request done, the second commit request from step
3 will do the same thing, but delete bitmap cache has been delete by
first commit, so it will fail on be
5. client will see commit fail

this pr check transaction status before sending delete bitmap task to
be, if transaction status is committed or visible, it no need to
recalculate delete bitmap again, just retrun rpc success to be.
dataroaring pushed a commit that referenced this pull request Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants