Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ER_LOCK_DEADLOCK #541

Open
leobarcellos opened this issue Nov 13, 2024 · 16 comments
Open

ER_LOCK_DEADLOCK #541

leobarcellos opened this issue Nov 13, 2024 · 16 comments

Comments

@leobarcellos
Copy link
Contributor

I'm receiving constantly deadlock issues and I don't know what else I could do.
I was working with 5 workers, maybe they were competing. However I just modified to just 1 worker and it's still happening.

I even tried putting some locks (using acquireLock and releaseLock) before doing these jobs, but even this doesn't worked (the ER_LOCK_DEADLOCK error is from AWS RDS Aurora MySQL and I'm not sure how can I solve this)

image

Deadlock issues generally are solved on the application, but I don't know if this is the case, I even tried to use this repo:
https://www.npmjs.com/package/@tanjaae/knex-mysql2-deadlock

But could not manage it to work, I think they conflict with current knex/mysql2 versions.

@pushchris don't you suffer with these deadlock on worker instances?

@leobarcellos
Copy link
Contributor Author

image

Is this expected behaviour? Or something is wrong because I have 8k+ jobs on queue?

@leobarcellos
Copy link
Contributor Author

Update: I increased REDIS_CONCURRENCY of workers and increased memory a little bit, now sometimes I get 0 in queue, and on busy hours it's average 3k in queue

However, I sill see deadlock issues.

@pushchris
Copy link
Contributor

If you have perpetual jobs in the queue you normally should spin up an additional worker to help handle the load. Would generally suggest going that approach vs increasing the concurrency too high. There will most likely always be jobs in the queue though if you have data always coming in, unsure of what your setup is.

As for the deadlocks, looks like they are getting retried (jobs all retry) but still failing which is interesting and leads me to believe whatever the lock is may be fairly long running. Could you see if you can get any more information on the deadlocks? https://www.percona.com/blog/how-to-deal-with-mysql-deadlocks/ would be curious to see what table is preventing the inserts. Typically inserts don't have problems unless 1) the record already exists (which is most likely not the case here since on retry it still fails) or 2) an index behind the record is being blocked by other inserts of operations.

@leobarcellos
Copy link
Contributor Author

Yeah, I checked some of the deadlock logs.
Seems like it's related to journey step.

LATEST DETECTED DEADLOCK
------------------------
2024-11-19 16:53:41 22463513548544
*** (1) TRANSACTION:
TRANSACTION 174552900, ACTIVE 41 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s)
MySQL thread id 146539, OS thread handle 22466070546176, query id 607006322 172.30.2.161 overwatch updating
update `journey_user_step` set `type` = 'pending', `updated_at` = '2024-11-19 16:53:00.071' where `journey_id` = 42 and `type` = 'delay' and `delay_until` <= '2024-11-19 16:53:00.071'

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 56 page no 93799 n bits 480 index journey_user_step_journey_id_foreign of table `parcelvoy`.`journey_user_step` trx id 174552900 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 0000002a; asc    *;;
 1: len 4; hex 00113852; asc   8R;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 56 page no 19501 n bits 152 index PRIMARY of table `parcelvoy`.`journey_user_step` trx id 174552900 lock_mode X locks rec but not gap waiting
Record lock, heap no 50 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 00113852; asc   8R;;
 1: len 6; hex 000000001790; asc       ;;
 2: len 7; hex 82000000c71092; asc        ;;
 3: len 4; hex 0000b7cb; asc     ;;
 4: len 4; hex 0000002a; asc    *;;
 5: len 4; hex 00000291; asc     ;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 670d3628; asc g 6(;;
 8: len 4; hex 670d7bcd; asc g { ;;
 9: SQL NULL;
 10: SQL NULL;
 11: len 4; hex 670d7bcd; asc g { ;;
 12: len 30; hex 000100de030b0005000010006576656e741e00ce03d6000300d9000300dc; asc             event             ; (total 991 bytes);
 13: SQL NULL;


*** (2) TRANSACTION:
TRANSACTION 174518507, ACTIVE 99 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 64442 lock struct(s), heap size 7348344, 6219107 row lock(s)
MySQL thread id 146536, OS thread handle 22463511447296, query id 606884421 172.30.2.161 overwatch updating
update `journey_user_step` set `type` = 'pending', `updated_at` = '2024-11-19 16:52:02.662' where `journey_id` = 25 and `type` = 'delay' and `delay_until` <= '2024-11-19 16:52:02.662'

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 56 page no 19501 n bits 152 index PRIMARY of table `parcelvoy`.`journey_user_step` trx id 174518507 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 00113822; asc   8";;
 1: len 6; hex 000000001790; asc       ;;
 2: len 7; hex 82000000c70df2; asc        ;;
 3: len 4; hex 0000a442; asc    B;;
 4: len 4; hex 0000002d; asc    -;;
 5: len 4; hex 000002cd; asc     ;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 670d3606; asc g 6 ;;
 8: len 4; hex 670d3615; asc g 6 ;;
 9: SQL NULL;
 10: SQL NULL;
 11: len 4; hex 670d3615; asc g 6 ;;
 12: len 30; hex 000100d8000b0005000010006576656e740800c8003c0003003f00040043; asc             event    <   ?   C; (total 217 bytes);
 13: SQL NULL;

@leobarcellos
Copy link
Contributor Author

Interestingly, I ran a select count(*) from journey_user_step to check the size, and it made me wait like 40+ seconds to retrieve the answer.

(And count was 6M)

Weird how long this is taking

@leobarcellos
Copy link
Contributor Author

image

These average latency is kinda high for some SQL, like, 3s, 5s, 6s on average for a query is weirdly high.

@leobarcellos
Copy link
Contributor Author

Well, I just increased the AWS RDS instance size and it looks like the problem went away. At least on the first hour I cannot see any deadlock, I will update here after 24h

Maybe the bottleneck was on the DB instance itself

@leobarcellos
Copy link
Contributor Author

Well, too soon. I just got a deadlock. But for sure it lowered the frequency.

TRANSACTION 181033389, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 9 lock struct(s), heap size 1128, 5 row lock(s), undo log entries 1
MySQL thread id 1623, OS thread handle 22849702577920, query id 6008867 172.30.2.161 overwatch update
insert into `journey_user_step` (`created_at`, `data`, `delay_until`, `ended_at`, `entrance_id`, `id`, `journey_id`, `ref`, `step_id`, `type`, `updated_at`, `user_id`) values ('2024-11-19 21:31:09.934', DEFAULT, DEFAULT, DEFAULT, 6248642, DEFAULT, 138, DEFAULT, 3382, 'completed', '2024-11-19 21:31:09.938', 211139)
RECORD LOCKS space id 56 page no 110020 n bits 952 index journey_user_step_journey_id_foreign of table `parcelvoy`.`journey_user_step` trx id 181033389 lock_mode X locks rec but not gap
Record lock, heap no 883 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 0000008a; asc     ;;
 1: len 4; hex 005f592f; asc  _Y/;;

RECORD LOCKS space id 56 page no 113313 n bits 176 index PRIMARY of table `parcelvoy`.`journey_user_step` trx id 181033389 lock mode S locks rec but not gap waiting
Record lock, heap no 8 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 005f58c2; asc  _X ;;
 1: len 6; hex 00000aca03fd; asc       ;;
 2: len 7; hex 82000000b50110; asc        ;;
 3: len 4; hex 000338c3; asc   8 ;;
 4: len 4; hex 0000008a; asc     ;;
 5: len 4; hex 00000d3b; asc    ;;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673d0384; asc g=  ;;
 8: len 4; hex 673d0385; asc g=  ;;
 9: SQL NULL;
 10: SQL NULL;
 11: SQL NULL;
 12: len 30; hex 00010052000b0005000010006576656e74020042001200040016000a000c; asc    R        event  B          ; (total 83 bytes);
 13: SQL NULL;

TRANSACTION 181033187, ACTIVE 2 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 8275 lock struct(s), heap size 958584, 68733 row lock(s)
MySQL thread id 1585, OS thread handle 22863496779520, query id 6008125 172.30.2.161 overwatch updating
update `journey_user_step` set `type` = 'pending', `updated_at` = '2024-11-19 21:31:07.758' where `journey_id` = 138 and `type` = 'delay' and `delay_until` <= '2024-11-19 21:31:07.758'
RECORD LOCKS space id 56 page no 113313 n bits 176 index PRIMARY of table `parcelvoy`.`journey_user_step` trx id 181033187 lock_mode X locks rec but not gap
Record lock, heap no 8 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 005f58c2; asc  _X ;;
 1: len 6; hex 00000aca03fd; asc       ;;
 2: len 7; hex 82000000b50110; asc        ;;
 3: len 4; hex 000338c3; asc   8 ;;
 4: len 4; hex 0000008a; asc     ;;
 5: len 4; hex 00000d3b; asc    ;;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673d0384; asc g=  ;;
 8: len 4; hex 673d0385; asc g=  ;;
 9: SQL NULL;
 10: SQL NULL;
 11: SQL NULL;
 12: len 30; hex 00010052000b0005000010006576656e74020042001200040016000a000c; asc    R        event  B          ; (total 83 bytes);
 13: SQL NULL;

Record lock, heap no 9 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 005f58c3; asc  _X ;;
 1: len 6; hex 00000aca0432; asc      2;;
 2: len 7; hex 81000000de0110; asc        ;;
 3: len 4; hex 000338c2; asc   8 ;;
 4: len 4; hex 0000008a; asc     ;;
 5: len 4; hex 00000d3b; asc    ;;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673d0384; asc g=  ;;
 8: len 4; hex 673d0385; asc g=  ;;
 9: SQL NULL;
 10: SQL NULL;
 11: SQL NULL;
 12: len 30; hex 00010052000b0005000010006576656e74020042001200040016000a000c; asc    R        event  B          ; (total 83 bytes);
 13: SQL NULL;

Record lock, heap no 46 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 005f58e8; asc  _X ;;
 1: len 6; hex 00000aca2079; asc      y;;
 2: len 7; hex 82000001de0110; asc        ;;
 3: len 4; hex 000338c3; asc   8 ;;
 4: len 4; hex 0000008a; asc     ;;
 5: len 4; hex 00000d27; asc    ';;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673d038c; asc g=  ;;
 8: len 4; hex 673d038c; asc g=  ;;
 9: SQL NULL;
 10: len 4; hex 005f58c2; asc  _X ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;

Record lock, heap no 48 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 005f58ea; asc  _X ;;
 1: len 6; hex 00000aca20b3; asc       ;;
 2: len 7; hex 810000016d0110; asc     m  ;;
 3: len 4; hex 000338c2; asc   8 ;;
 4: len 4; hex 0000008a; asc     ;;
 5: len 4; hex 00000d27; asc    ';;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673d038d; asc g=  ;;
 8: len 4; hex 673d038d; asc g=  ;;
 9: SQL NULL;
 10: len 4; hex 005f58c3; asc  _X ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;

Record lock, heap no 49 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 005f58eb; asc  _X ;;
 1: len 6; hex 00000aca20ef; asc       ;;
 2: len 7; hex 82000000d10110; asc        ;;
 3: len 4; hex 000338c2; asc   8 ;;
 4: len 4; hex 0000008a; asc     ;;
 5: len 4; hex 00000d17; asc     ;;
 6: len 6; hex 616374696f6e; asc action;;
 7: len 4; hex 673d038d; asc g=  ;;
 8: len 4; hex 673d038d; asc g=  ;;
 9: SQL NULL;
 10: len 4; hex 005f58c3; asc  _X ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;

Record lock, heap no 53 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 005f58ef; asc  _X ;;
 1: len 6; hex 00000aca2416; asc     $ ;;
 2: len 7; hex 810000010d0110; asc        ;;
 3: len 4; hex 00028c85; asc     ;;
 4: len 4; hex 0000008a; asc     ;;
 5: len 4; hex 00000d1c; asc     ;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673d038d; asc g=  ;;
 8: len 4; hex 673d038d; asc g=  ;;
 9: SQL NULL;
 10: len 4; hex 005f5832; asc  _X2;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;

Record lock, heap no 54 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 005f58f0; asc  _X ;;
 1: len 6; hex 00000aca2542; asc     %B;;
 2: len 7; hex 82000000af0110; asc        ;;
 3: len 4; hex 0002aec4; asc     ;;
 4: len 4; hex 0000008a; asc     ;;
 5: len 4; hex 00000d1d; asc     ;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673d038e; asc g=  ;;
 8: len 4; hex 673d038e; asc g=  ;;
 9: SQL NULL;
 10: len 4; hex 005f5831; asc  _X1;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;

Record lock, heap no 106 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 005f58e9; asc  _X ;;
 1: len 6; hex 00000aca598b; asc     Y ;;
 2: len 7; hex 02000001f32cbc; asc      , ;;
 3: len 4; hex 000338c3; asc   8 ;;
 4: len 4; hex 0000008a; asc     ;;
 5: len 4; hex 00000d28; asc    (;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673d038d; asc g=  ;;
 8: len 4; hex 673d039e; asc g=  ;;
 9: SQL NULL;
 10: len 4; hex 005f58c2; asc  _X ;;
 11: SQL NULL;
 12: len 2; hex 0400; asc   ;;
 13: SQL NULL;

RECORD LOCKS space id 56 page no 110020 n bits 952 index journey_user_step_journey_id_foreign of table `parcelvoy`.`journey_user_step` trx id 181033187 lock_mode X waiting
Record lock, heap no 883 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 0000008a; asc     ;;
 1: len 4; hex 005f592f; asc  _Y/;;

@leobarcellos
Copy link
Contributor Author

@pushchris looks like the problem is the foreign key journey_user_step_journey_id_foreign -- is it really needed?

@pushchris
Copy link
Contributor

pushchris commented Nov 20, 2024

Well this is a fun one. There is definitely an index missing but I'm unsure how. All of the hosted versions I have include an index on the type and delay_until columns but its missing in the codebase so must have been removed or merged poorly at some point. I just added that in which should then speed up the query to find steps ready to be traversed which is what is currently locking your tables

@pushchris
Copy link
Contributor

@leobarcellos were you able to try out adding the indexes in to see if it solved the issue for you?

@leobarcellos
Copy link
Contributor Author

@pushchris Sure! I updated yesterday. Thanks for this fix. I can see that the deadlock errors decreased fairly, however I'm still receiving some.

Most of them still on the same table:

2024-11-22 15:51:06 22850015844096
*** (1) TRANSACTION:
TRANSACTION 303404576, ACTIVE 1 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 683 lock struct(s), heap size 90232, 3462 row lock(s)
MySQL thread id 82812, OS thread handle 22849891333888, query id 435631751 172.30.2.161 overwatch Searching rows for update
update `journey_user_step` set `type` = 'pending', `updated_at` = '2024-11-22 15:51:05.981' where `journey_id` = 50 and `type` = 'delay' and `delay_until` <= '2024-11-22 15:51:05.981'

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 56 page no 133256 n bits 864 index journey_user_step_type_delay_until_index of table `parcelvoy`.`journey_user_step` trx id 303404576 lock_mode X
Record lock, heap no 333 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 5; hex 64656c6179; asc delay;;
 1: len 4; hex 6740a830; asc g@ 0;;
 2: len 4; hex 006b1627; asc  k ';;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 56 page no 137614 n bits 232 index PRIMARY of table `parcelvoy`.`journey_user_step` trx id 303404576 lock_mode X locks rec but not gap waiting
Record lock, heap no 160 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 006b1627; asc  k ';;
 1: len 6; hex 0000121596ad; asc       ;;
 2: len 7; hex 01000003c61e43; asc       C;;
 3: len 4; hex 00036b1a; asc   k ;;
 4: len 4; hex 000000b8; asc     ;;
 5: len 4; hex 000012d2; asc     ;;
 6: len 7; hex 70656e64696e67; asc pending;;
 7: len 4; hex 673fff70; asc g? p;;
 8: len 4; hex 6740a86a; asc g@ j;;
 9: len 4; hex 6740a830; asc g@ 0;;
 10: len 4; hex 006b1609; asc  k  ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;


*** (2) TRANSACTION:
TRANSACTION 303404717, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 209 lock struct(s), heap size 24696, 1088 row lock(s), undo log entries 1
MySQL thread id 82795, OS thread handle 22857536276224, query id 435631975 172.30.2.161 overwatch updating
update `journey_user_step` set `type` = 'pending', `updated_at` = '2024-11-22 15:51:06.203' where `journey_id` = 184 and `type` = 'delay' and `delay_until` <= '2024-11-22 15:51:06.203'

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 56 page no 137614 n bits 232 index PRIMARY of table `parcelvoy`.`journey_user_step` trx id 303404717 lock_mode X locks rec but not gap
Record lock, heap no 73 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 006b1609; asc  k  ;;
 1: len 6; hex 00001071d629; asc    q );;
 2: len 7; hex 82000001a50110; asc        ;;
 3: len 4; hex 00036b1a; asc   k ;;
 4: len 4; hex 000000b8; asc     ;;
 5: len 4; hex 000012b3; asc     ;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673fff6e; asc g? n;;
 8: len 4; hex 673fff6e; asc g? n;;
 9: SQL NULL;
 10: SQL NULL;
 11: SQL NULL;
 12: len 30; hex 00010052000b0005000010006576656e74020042001200040016000a000c; asc    R        event  B          ; (total 83 bytes);
 13: SQL NULL;

Record lock, heap no 74 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 006b160a; asc  k  ;;
 1: len 6; hex 00001071d641; asc    q A;;
 2: len 7; hex 81000000aa0110; asc        ;;
 3: len 4; hex 00036b19; asc   k ;;
 4: len 4; hex 000000b8; asc     ;;
 5: len 4; hex 000012c2; asc     ;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673fff6e; asc g? n;;
 8: len 4; hex 673fff6f; asc g? o;;
 9: SQL NULL;
 10: SQL NULL;
 11: SQL NULL;
 12: len 30; hex 00010052000b0005000010006576656e74020042001200040016000a000c; asc    R        event  B          ; (total 83 bytes);
 13: SQL NULL;

Record lock, heap no 101 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 006b1625; asc  k %;;
 1: len 6; hex 00001071da75; asc    q u;;
 2: len 7; hex 82000000910110; asc        ;;
 3: len 4; hex 00036b19; asc   k ;;
 4: len 4; hex 000000b8; asc     ;;
 5: len 4; hex 000012d6; asc     ;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673fff70; asc g? p;;
 8: len 4; hex 673fff70; asc g? p;;
 9: SQL NULL;
 10: len 4; hex 006b160a; asc  k  ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;

Record lock, heap no 102 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 006b1626; asc  k &;;
 1: len 6; hex 00001071da76; asc    q v;;
 2: len 7; hex 810000015c0110; asc     \  ;;
 3: len 4; hex 00036b1a; asc   k ;;
 4: len 4; hex 000000b8; asc     ;;
 5: len 4; hex 000012a8; asc     ;;
 6: len 9; hex 636f6d706c65746564; asc completed;;
 7: len 4; hex 673fff70; asc g? p;;
 8: len 4; hex 673fff70; asc g? p;;
 9: SQL NULL;
 10: len 4; hex 006b1609; asc  k  ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;

Record lock, heap no 160 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 4; hex 006b1627; asc  k ';;
 1: len 6; hex 0000121596ad; asc       ;;
 2: len 7; hex 01000003c61e43; asc       C;;
 3: len 4; hex 00036b1a; asc   k ;;
 4: len 4; hex 000000b8; asc     ;;
 5: len 4; hex 000012d2; asc     ;;
 6: len 7; hex 70656e64696e67; asc pending;;
 7: len 4; hex 673fff70; asc g? p;;
 8: len 4; hex 6740a86a; asc g@ j;;
 9: len 4; hex 6740a830; asc g@ 0;;
 10: len 4; hex 006b1609; asc  k  ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 56 page no 133256 n bits 864 index journey_user_step_type_delay_until_index of table `parcelvoy`.`journey_user_step` trx id 303404717 lock_mode X locks rec but not gap waiting
Record lock, heap no 333 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 5; hex 64656c6179; asc delay;;
 1: len 4; hex 6740a830; asc g@ 0;;
 2: len 4; hex 006b1627; asc  k ';;

*** WE ROLL BACK TRANSACTION (2)

I can see that eventually I get a few on campaign_sends as well, this one have 3 transactions, 3 locks and 3 tables in use, weird:

2024-11-21 20:39:03 22850015844096
*** (1) TRANSACTION:
TRANSACTION 267942180, ACTIVE 2 sec fetching rows
mysql tables in use 3, locked 3
LOCK WAIT 235 lock struct(s), heap size 41080, 32123 row lock(s)
MySQL thread id 57280, OS thread handle 22863496779520, query id 310926042 172.30.2.161 overwatch Searching rows for update
update `campaign_sends` set `state` = 'failed' where `campaign_sends`.`send_at` < '2024-11-19 20:39:01.192' and `campaign_sends`.`state` = 'throttled' and `campaign_id` = 2750

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 49 page no 31986 n bits 104 index campaign_sends_state_index of table `parcelvoy`.`campaign_sends` trx id 267942180 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 9 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 9; hex 7468726f74746c6564; asc throttled;;
 1: len 4; hex 00000175; asc    u;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 49 page no 35524 n bits 1272 index campaign_sends_campaign_id_foreign of table `parcelvoy`.`campaign_sends` trx id 267942180 lock_mode X waiting
Record lock, heap no 634 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 00000abe; asc     ;;
 1: len 4; hex 00423719; asc  B7 ;;


*** (2) TRANSACTION:
TRANSACTION 267942554, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 16 lock struct(s), heap size 3488, 21 row lock(s), undo log entries 10
MySQL thread id 57210, OS thread handle 22849954195200, query id 310927677 172.30.2.161 overwatch update
insert into `campaign_sends` (`campaign_id`, `send_at`, `state`, `user_id`) values (2639, '2024-11-21 21:00:00.000', 'pending', 226969), (2750, '2024-11-22 03:00:00.000', 'pending', 226969), (2751, '2024-11-22 15:00:00.000', 'pending', 226969), (2753, '2024-11-22 21:00:00.000', 'pending', 226969), (2754, '2024-11-23 03:00:00.000', 'pending', 226969), (2755, '2024-11-23 15:00:00.000', 'pending', 226969), (2756, '2024-11-23 21:00:00.000', 'pending', 226969), (2757, '2024-11-24 03:00:00.000', 'pending', 226969), (2761, '2024-11-24 15:00:00.000', 'pending', 226969), (2762, '2024-11-24 21:00:00.000', 'pending', 226969), (2763, '2024-11-25 03:00:00.000', 'pending', 226969) on duplicate key update `state` = values(`state`), `send_at` = values(`send_at`)

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 49 page no 35524 n bits 1272 index campaign_sends_campaign_id_foreign of table `parcelvoy`.`campaign_sends` trx id 267942554 lock_mode X locks rec but not gap
Record lock, heap no 634 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 00000abe; asc     ;;
 1: len 4; hex 00423719; asc  B7 ;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 49 page no 35568 n bits 1272 index campaign_sends_campaign_id_foreign of table `parcelvoy`.`campaign_sends` trx id 267942554 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 362 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 00000acb; asc     ;;
 1: len 4; hex 00419fdf; asc  A  ;;


*** (3) TRANSACTION:
TRANSACTION 267942197, ACTIVE 2 sec starting index read
mysql tables in use 3, locked 3
LOCK WAIT 4 lock struct(s), heap size 1128, 3 row lock(s)
MySQL thread id 57257, OS thread handle 22860524590848, query id 310926228 172.30.2.161 overwatch Searching rows for update
update `campaign_sends` set `state` = 'failed' where `campaign_sends`.`send_at` < '2024-11-19 20:39:01.378' and `campaign_sends`.`state` = 'throttled' and `campaign_id` = 2763

*** (3) HOLDS THE LOCK(S):
RECORD LOCKS space id 49 page no 35568 n bits 1272 index campaign_sends_campaign_id_foreign of table `parcelvoy`.`campaign_sends` trx id 267942197 lock_mode X
Record lock, heap no 362 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 00000acb; asc     ;;
 1: len 4; hex 00419fdf; asc  A  ;;


*** (3) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 49 page no 31986 n bits 104 index campaign_sends_state_index of table `parcelvoy`.`campaign_sends` trx id 267942197 lock_mode X waiting
Record lock, heap no 9 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 9; hex 7468726f74746c6564; asc throttled;;
 1: len 4; hex 00000175; asc    u;;

*** WE ROLL BACK TRANSACTION (3)

This second one, I could see that is caused by the checkStalledSends function that is called by the CampaignEnqueueSendsJobs handler. This basically changes the status of 'throttled' sends to 'failed' for the sends that are stuck on 'throttled' for more than 2 days. However this is done on every CampaignEnqueueSendsJobs, isn't it bad to be done there? Does it makes sense to create another job to do that? By the way, when a send is marked with 'failed', does it change anything?

@leobarcellos
Copy link
Contributor Author

Just got another different deadlock error, now on rule_evaluations

------------------------
LATEST DETECTED DEADLOCK
------------------------
2024-11-22 21:02:40 22850015844096
*** (1) TRANSACTION:
TRANSACTION 313487207, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 8 lock struct(s), heap size 1128, 5 row lock(s)
MySQL thread id 89759, OS thread handle 22862786561792, query id 471002117 172.30.2.161 overwatch update
insert into `rule_evaluations` (`result`, `rule_id`, `user_id`) values (true, 176, 219661), (true, 178, 219661), (false, 176, 219663), (false, 178, 219663), (true, 176, 219664), (true, 178, 219664), (false, 176, 219686), (true, 178, 219686), (true, 176, 219700), (false, 178, 219700), (false, 176, 219723), (false, 178, 219723), (true, 176, 219734), (true, 178, 219734), (false, 176, 219757), (false, 178, 219757), (false, 176, 219759), (false, 178, 219759), (true, 176, 219773), (true, 178, 219773), (true, 176, 219781), (true, 178, 219781), (true, 176, 219787), (true, 178, 219787), (false, 176, 219815), (false, 178, 219815), (true, 176, 219817), (true, 178, 219817), (false, 176, 219820), (false, 178, 219820), (true, 176, 219842), (true, 178, 219842), (true, 176, 219850), (false, 178, 219850), (true, 176, 219857), (false, 178, 219857), (true, 176, 219865), (false, 178, 219865), (false, 176, 219877), (false, 178, 219877), (true, 176, 219889), (true, 178, 219889), (false, 176, 219900), (false, 178, 219900), (false, 176, 219930), (false, 178, 219930), (false, 176, 219949), (false, 178, 219949), (false, 176, 219970), (false, 178, 219970), (false, 176, 219985), (false, 178, 219985), (true, 176, 219988), (true, 178, 219988), (true, 176, 220002), (true, 178, 220002), (true, 176, 220004), (true, 178, 220004), (false, 176, 220005), (false, 178, 220005), (true, 176, 220015), (false, 178, 220015), (false, 176, 220027), (false, 178, 220027), (true, 176, 220030), (true, 178, 220030), (false, 176, 220058), (false, 178, 220058), (false, 176, 220063), (false, 178, 220063), (true, 176, 220064), (true, 178, 220064), (false, 176, 220306), (false, 178, 220306), (true, 176, 220310), (true, 178, 220310), (true, 176, 220312), (true, 178, 220312), (true, 176, 220317), (false, 178, 220317), (false, 176, 220320), (false, 178, 220320), (true, 176, 220323), (false, 178, 220323), (true, 176, 220330), (true, 178, 220330), (false, 176, 220334), (false, 178, 220334), (true, 176, 220339), (false, 178, 220339), (false, 176, 220357), (false, 178, 220357), (false, 176, 220379), (true, 178, 220379), (false, 176, 220401), (false, 178, 220401), (false, 176, 220403), (false, 178, 220403), (true, 176, 220409), (true, 178, 220409) on duplicate key update `result` = values(`result`)

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 69 page no 9905 n bits 328 index PRIMARY of table `parcelvoy`.`rule_evaluations` trx id 313487207 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 69 page no 9905 n bits 328 index PRIMARY of table `parcelvoy`.`rule_evaluations` trx id 313487207 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;


*** (2) TRANSACTION:
TRANSACTION 313487206, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 8 lock struct(s), heap size 1128, 5 row lock(s), undo log entries 1
MySQL thread id 89764, OS thread handle 22849954195200, query id 471002116 172.30.2.161 overwatch update
insert into `rule_evaluations` (`result`, `rule_id`, `user_id`) values (false, 581, 188939), (false, 583, 188939), (true, 581, 188943), (true, 583, 188943), (true, 581, 188944), (false, 583, 188944), (true, 581, 188950), (false, 583, 188950), (true, 581, 188958), (false, 583, 188958), (false, 581, 188967), (false, 583, 188967), (true, 581, 188971), (false, 583, 188971), (false, 581, 188992), (false, 583, 188992), (false, 581, 188997), (false, 583, 188997), (false, 581, 188998), (false, 583, 188998), (false, 581, 189009), (false, 583, 189009), (false, 581, 189013), (false, 583, 189013), (false, 581, 189029), (false, 583, 189029), (false, 581, 189040), (false, 583, 189040), (true, 581, 189056), (true, 583, 189056), (false, 581, 189061), (false, 583, 189061), (false, 581, 189065), (false, 583, 189065), (false, 581, 189092), (false, 583, 189092), (true, 581, 189102), (false, 583, 189102), (true, 581, 189103), (true, 583, 189103), (true, 581, 189123), (false, 583, 189123), (true, 581, 189126), (false, 583, 189126), (false, 581, 189142), (false, 583, 189142), (false, 581, 189145), (false, 583, 189145), (false, 581, 189163), (false, 583, 189163), (false, 581, 189183), (false, 583, 189183), (false, 581, 189187), (false, 583, 189187), (false, 581, 189202), (false, 583, 189202), (false, 581, 189225), (false, 583, 189225), (false, 581, 189227), (false, 583, 189227), (false, 581, 189242), (false, 583, 189242), (false, 581, 189245), (false, 583, 189245), (false, 581, 189262), (false, 583, 189262), (false, 581, 189265), (false, 583, 189265), (false, 581, 189266), (false, 583, 189266), (false, 581, 189276), (false, 583, 189276), (false, 581, 189283), (false, 583, 189283), (false, 581, 189297), (false, 583, 189297), (false, 581, 189305), (false, 583, 189305), (true, 581, 189366), (false, 583, 189366), (true, 581, 189372), (true, 583, 189372), (true, 581, 189391), (true, 583, 189391), (true, 581, 189430), (true, 583, 189430), (false, 581, 189434), (false, 583, 189434), (true, 581, 189492), (false, 583, 189492), (true, 581, 189501), (true, 583, 189501), (false, 581, 189538), (false, 583, 189538), (true, 581, 189547), (false, 583, 189547), (true, 581, 189566), (true, 583, 189566), (true, 581, 189575), (false, 583, 189575) on duplicate key update `result` = values(`result`)

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 69 page no 9905 n bits 328 index PRIMARY of table `parcelvoy`.`rule_evaluations` trx id 313487206 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 69 page no 9905 n bits 328 index PRIMARY of table `parcelvoy`.`rule_evaluations` trx id 313487206 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (1)

@pushchris
Copy link
Contributor

What is your DB usage looking like? Specifically CPU and memory usage. This many deadlocks makes me think it's under provisioned since most of those tables are pretty simple but there may be something else underlying that is causing it.

@leobarcellos
Copy link
Contributor Author

What is your DB usage looking like? Specifically CPU and memory usage. This many deadlocks makes me think it's under provisioned since most of those tables are pretty simple but there may be something else underlying that is causing it.

I thought it too, but I increased the DB size 2 weeks ago and it's pretty consistent so far:

Screenshot 2024-12-03 at 4 24 19 PM Screenshot 2024-12-03 at 4 27 46 PM

Wait times still high (compared to others) on the queries I mentioned:

Screenshot 2024-12-03 at 4 23 13 PM

By the way, I haven't got the last update, will do and then I come back here :)

@pushchris
Copy link
Contributor

Awesome! #551 should hopefully help with the second and third items in the top SQL list you sent over and reduce total number of deadlocks on those tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants