Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? #584

Open
leobarcellos opened this issue Dec 19, 2024 · 8 comments

Comments

@leobarcellos
Copy link
Contributor

image

Not sure if related to #583, but we are facing some issues that are causing real trouble here.

Our workers when receive this KnexTimeoutError error, they just stall and keep at 0% cpu % and everything just freezes.
When I restart the workers, they come back but eventually it pops this error and everything happen again.

I even tried reducing the 6 workers to just 1 with more cpu and memory usage, but it happened just like before.

Things I already tried:

  • Bumping up version of knex and mysql2
  • Using pool config, tried min:0/max:30, min:10/max:50, and several other combinations
  • Raising pool timeout to 120, and even 300

Log:

KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n at Client_MySQL2.acquireConnection (/usr/src/app/node_modules/knex/lib/client.js:332:26)\n at async Runner.ensureConnection (/usr/src/app/node_modules/knex/lib/execution/runner.js:305:28)\n at async Runner.run (/usr/src/app/node_modules/knex/lib/execution/runner.js:30:19)\n at async user_id.user_id [as callback] (/usr/src/app/build/campaigns/CampaignService.js:262:9)\n at async Chunker.flush (/usr/src/app/build/utilities/index.js:242:13)\n at async chunk (/usr/src/app/build/utilities/index.js:221:5)\n at async generateSendList (/usr/src/app/build/campaigns/CampaignService.js:261:5)\n at async handler (/usr/src/app/build/campaigns/CampaignGenerateListJob.js:33:9)\n at async Queue.dequeue (/usr/src/app/build/queue/Queue.js:59:9)\n at async worker.bullmq_1.Worker.connection [as processFn] (/usr/src/app/build/queue/RedisQueueProvider.js:78:13)\n at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:455:28)\n at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:640:24)"},"stacktrace":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n at createQueryBuilder (/usr/src/app/node_modules/knex/lib/knex-builder/make-knex.js:320:26)\n at knex (/usr/src/app/node_modules/knex/lib/knex-builder/make-knex.js:101:12)\n at CampaignSend.table (/usr/src/app/build/core/Model.js:245:16)\n at CampaignSend.query (/usr/src/app/build/core/Model.js:59:21)\n at user_id.user_id [as callback] (/usr/src/app/build/campaigns/CampaignService.js:262:39)\n at Chunker.flush (/usr/src/app/build/utilities/index.js:242:24)\n at chunk (/usr/src/app/build/utilities/index.js:221:19)\n at processTicksAndRejections (node:internal/process/task_queues:95:5)\n at runNextTicks (node:internal/process/task_queues:64:3)\n at listOnTimeout (node:internal/timers:538:9)\n at process.processTimers (node:internal/timers:512:7)\n at async generateSendList (/usr/src/app/build/campaigns/CampaignService.js:261:5)","job":{"name":"campaign_generate_list_job"

More logs:

worker-1  | 2024-12-19T15:14:09.078764999Z {"level":50,"time":1734621249078,"pid":8,"hostname":"9e4e7dfd9634","err":{"type":"TypeError","message":"Cannot read properties of undefined (reading '__knexUid')","stack":"TypeError: Cannot read properties of undefined (reading '__knexUid')\n    at Client_MySQL2.releaseConnection (/usr/src/app/node_modules/knex/lib/client.js:344:58)\n    at Transform.<anonymous> (/usr/src/app/node_modules/knex/lib/execution/runner.js:72:19)\n    at Transform.emit (node:events:529:35)\n    at Transform.emit (node:domain:489:12)\n    at emitCloseNT (node:internal/streams/destroy:132:10)\n    at emitErrorCloseNT (node:internal/streams/destroy:117:3)\n    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)"},"msg":"uncaught error"}
worker-1  | 2024-12-19T15:14:09.079955133Z {"level":50,"time":1734621249079,"pid":8,"hostname":"9e4e7dfd9634","err":{"type":"KnexTimeoutError","message":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n    at Client_MySQL2.acquireConnection (/usr/src/app/node_modules/knex/lib/client.js:332:26)\n    at async Runner.ensureConnection (/usr/src/app/node_modules/knex/lib/execution/runner.js:305:28)\n    at async chunk (/usr/src/app/build/utilities/index.js:216:5)\n    at async failStalledSends (/usr/src/app/build/campaigns/CampaignService.js:293:5)\n    at async handler (/usr/src/app/build/campaigns/CampaignEnqueueSendsJob.js:40:9)\n    at async Queue.dequeue (/usr/src/app/build/queue/Queue.js:59:9)\n    at async worker.bullmq_1.Worker.connection [as processFn] (/usr/src/app/build/queue/RedisQueueProvider.js:78:13)\n    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:455:28)\n    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:640:24)","name":"KnexTimeoutError"},"msg":"uncaught error"}
worker-1  | 2024-12-19T15:14:09.267547045Z {"level":50,"time":1734621249267,"pid":8,"hostname":"9e4e7dfd9634","err":{"type":"TypeError","message":"Cannot read properties of undefined (reading '__knexUid')","stack":"TypeError: Cannot read properties of undefined (reading '__knexUid')\n    at Client_MySQL2.releaseConnection (/usr/src/app/node_modules/knex/lib/client.js:344:58)\n    at Transform.<anonymous> (/usr/src/app/node_modules/knex/lib/execution/runner.js:72:19)\n    at Transform.emit (node:events:529:35)\n    at Transform.emit (node:domain:489:12)\n    at emitCloseNT (node:internal/streams/destroy:132:10)\n    at emitErrorCloseNT (node:internal/streams/destroy:117:3)\n    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)"},"msg":"uncaught error"}
worker-1  | 2024-12-19T15:14:09.268614034Z {"level":50,"time":1734621249268,"pid":8,"hostname":"9e4e7dfd9634","err":{"type":"KnexTimeoutError","message":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n    at Client_MySQL2.acquireConnection (/usr/src/app/node_modules/knex/lib/client.js:332:26)\n    at async Runner.ensureConnection (/usr/src/app/node_modules/knex/lib/execution/runner.js:305:28)\n    at async chunk (/usr/src/app/build/utilities/index.js:216:5)\n    at async failStalledSends (/usr/src/app/build/campaigns/CampaignService.js:293:5)\n    at async handler (/usr/src/app/build/campaigns/CampaignEnqueueSendsJob.js:40:9)\n    at async Queue.dequeue (/usr/src/app/build/queue/Queue.js:59:9)\n    at async worker.bullmq_1.Worker.connection [as processFn] (/usr/src/app/build/queue/RedisQueueProvider.js:78:13)\n    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:455:28)\n    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:640:24)","name":"KnexTimeoutError"},"msg":"uncaught error"}
worker-1  | 2024-12-19T15:14:09.410183643Z {"level":50,"time":1734621249409,"pid":8,"hostname":"9e4e7dfd9634","err":{"type":"TypeError","message":"Cannot read properties of undefined (reading '__knexUid')","stack":"TypeError: Cannot read properties of undefined (reading '__knexUid')\n    at Client_MySQL2.releaseConnection (/usr/src/app/node_modules/knex/lib/client.js:344:58)\n    at Transform.<anonymous> (/usr/src/app/node_modules/knex/lib/execution/runner.js:72:19)\n    at Transform.emit (node:events:529:35)\n    at Transform.emit (node:domain:489:12)\n    at emitCloseNT (node:internal/streams/destroy:132:10)\n    at emitErrorCloseNT (node:internal/streams/destroy:117:3)\n    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)"},"msg":"uncaught error"}
worker-1  | 2024-12-19T15:14:09.412188811Z {"level":50,"time":1734621249411,"pid":8,"hostname":"9e4e7dfd9634","err":{"type":"KnexTimeoutError","message":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n    at Client_MySQL2.acquireConnection (/usr/src/app/node_modules/knex/lib/client.js:332:26)\n    at async Runner.ensureConnection (/usr/src/app/node_modules/knex/lib/execution/runner.js:305:28)\n    at async chunk (/usr/src/app/build/utilities/index.js:216:5)\n    at async failStalledSends (/usr/src/app/build/campaigns/CampaignService.js:293:5)\n    at async handler (/usr/src/app/build/campaigns/CampaignEnqueueSendsJob.js:40:9)\n    at async Queue.dequeue (/usr/src/app/build/queue/Queue.js:59:9)\n    at async worker.bullmq_1.Worker.connection [as processFn] (/usr/src/app/build/queue/RedisQueueProvider.js:78:13)\n    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:455:28)\n    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:640:24)","name":"KnexTimeoutError"},"msg":"uncaught error"}
worker-1  | 2024-12-19T15:14:09.940215774Z {"level":50,"time":1734621249939,"pid":8,"hostname":"9e4e7dfd9634","err":{"type":"TypeError","message":"Cannot read properties of undefined (reading '__knexUid')","stack":"TypeError: Cannot read properties of undefined (reading '__knexUid')\n    at Client_MySQL2.releaseConnection (/usr/src/app/node_modules/knex/lib/client.js:344:58)\n    at Transform.<anonymous> (/usr/src/app/node_modules/knex/lib/execution/runner.js:72:19)\n    at Transform.emit (node:events:529:35)\n    at Transform.emit (node:domain:489:12)\n    at emitCloseNT (node:internal/streams/destroy:132:10)\n    at emitErrorCloseNT (node:internal/streams/destroy:117:3)\n    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)"},"msg":"uncaught error"}
worker-1  | 2024-12-19T15:14:09.940727866Z {"level":50,"time":1734621249940,"pid":8,"hostname":"9e4e7dfd9634","err":{"type":"KnexTimeoutError","message":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n    at Client_MySQL2.acquireConnection (/usr/src/app/node_modules/knex/lib/client.js:332:26)\n    at async Runner.ensureConnection (/usr/src/app/node_modules/knex/lib/execution/runner.js:305:28)\n    at async chunk (/usr/src/app/build/utilities/index.js:216:5)\n    at async failStalledSends (/usr/src/app/build/campaigns/CampaignService.js:293:5)\n    at async handler (/usr/src/app/build/campaigns/CampaignEnqueueSendsJob.js:40:9)\n    at async Queue.dequeue (/usr/src/app/build/queue/Queue.js:59:9)\n    at async worker.bullmq_1.Worker.connection [as processFn] (/usr/src/app/build/queue/RedisQueueProvider.js:78:13)\n    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:455:28)\n    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:640:24)","name":"KnexTimeoutError"},"msg":"uncaught error"}
worker-1  | 2024-12-19T15:15:08.888488789Z {"level":50,"time":1734621308888,"pid":8,"hostname":"9e4e7dfd9634","error":{"name":"KnexTimeoutError","originalStack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n    at Client_MySQL2.acquireConnection (/usr/src/app/node_modules/knex/lib/client.js:332:26)\n    at async Runner.ensureConnection (/usr/src/app/node_modules/knex/lib/execution/runner.js:305:28)\n    at async Runner.run (/usr/src/app/node_modules/knex/lib/execution/runner.js:30:19)\n    at async user_id.user_id [as callback] (/usr/src/app/build/campaigns/CampaignService.js:262:9)\n    at async Chunker.flush (/usr/src/app/build/utilities/index.js:242:13)\n    at async chunk (/usr/src/app/build/utilities/index.js:221:5)\n    at async generateSendList (/usr/src/app/build/campaigns/CampaignService.js:261:5)\n    at async handler (/usr/src/app/build/campaigns/CampaignGenerateListJob.js:33:9)\n    at async Queue.dequeue (/usr/src/app/build/queue/Queue.js:59:9)\n    at async worker.bullmq_1.Worker.connection [as processFn] (/usr/src/app/build/queue/RedisQueueProvider.js:78:13)\n    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:455:28)\n    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:640:24)"},"stacktrace":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n    at createQueryBuilder (/usr/src/app/node_modules/knex/lib/knex-builder/make-knex.js:320:26)\n    at knex (/usr/src/app/node_modules/knex/lib/knex-builder/make-knex.js:101:12)\n    at CampaignSend.table (/usr/src/app/build/core/Model.js:245:16)\n    at CampaignSend.query (/usr/src/app/build/core/Model.js:59:21)\n    at user_id.user_id [as callback] (/usr/src/app/build/campaigns/CampaignService.js:262:39)\n    at Chunker.flush (/usr/src/app/build/utilities/index.js:242:24)\n    at chunk (/usr/src/app/build/utilities/index.js:221:19)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async generateSendList (/usr/src/app/build/campaigns/CampaignService.js:261:5)\n    at async handler (/usr/src/app/build/campaigns/CampaignGenerateListJob.js:33:9)\n    at async Queue.dequeue (/usr/src/app/build/queue/Queue.js:59:9)\n    at async worker.bullmq_1.Worker.connection [as processFn] (/usr/src/app/build/queue/RedisQueueProvider.js:78:13)","job":{"name":"campaign_generate_list_job"

And also, lock waits increased significantly, which is related to #541

LATEST DETECTED DEADLOCK
------------------------
2024-12-19 14:01:04 22850015844096
*** (1) TRANSACTION:
TRANSACTION 1510867228, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 9 lock struct(s), heap size 1128, 5 row lock(s)
MySQL thread id 820700, OS thread handle 22849620932352, query id 5196453104 172.30.2.161 overwatch update
insert into campaign_sends (campaign_id, send_at, state, user_id) values (5265, '2024-12-23 12:00:00.000', 'pending', 491098), (5265, '2024-12-23 12:00:00.000', 'pending', 484927), (5265, '2024-12-23 12:00:00.000', 'pending', 366772), (5265, '2024-12-23 12:00:00.000', 'pending', 370134), (5265, '2024-12-23 12:00:00.000', 'pending', 498434), (5265, '2024-12-23 12:00:00.000', 'pending', 476084), (5265, '2024-12-23 12:00:00.000', 'pending', 495188), (5265, '2024-12-23 12:00:00.000', 'pending', 482489), (5265, '2024-12-23 12:00:00.000', 'pending', 484886), (5265, '2024-12-23 12:00:00.000', 'pending', 477965), (5265, '2024-12-23 12:00:00.000', 'pending', 436281), (5265, '2024-12-23 12:00:00.000', 'pending', 298643), (5265, '2024-12-23 12:00:00.000', 'pending', 369129), (5265, '2024-12-23 12:00:00.000', 'pending', 494773), (5265, '2024-12-23 12:00:00.000', 'pending', 492828), (5265, '2024-12-23 12:00:00.000', 'pending', 499544), (5265, '2024-12-23 12:00:00.000', 'pending', 495346), (5265, '2024-12-23 12:00:00.000', 'pending', 298577), (5265, '2024-12-23 12:00:00.000', 'pending', 491625), (5265, '2024-12-23 12:00:00.000', 'pending', 492682), (5265, '2024-12-23 12:00:00.000', 'pending', 486056), (5265, '2024-12-23 12:00:00.000', 'pending', 363110), (5265, '2024-12-23 12:00:00.000', 'pending', 480480), (5265, '2024-12-23 12:00:00.000', 'pending', 482418), (5265, '2024-12-23 12:00:00.000', 'pending', 402823), (5265, '2024-12-23 12:00:00.000', 'pending', 485036), (5265, '2024-12-23 12:00:00.000', 'pending', 480538), (5265, '2024-12-23 12:00:00.000', 'pending', 298184), (5265, '2024-12-23 12:00:00.000', 'pending', 407056), (5265, '2024-12-23 12:00:00.000', 'pending', 476230), (5265, '2024-12-23 12:00:00.000', 'pending', 337225), (5265, '2024-12-23 12:00:00.000', 'pending', 332614), (5265, '2024-12-23 12:00:00.000', 'pending', 366316), (5265, '2024-12-23 12:00:00.000', 'pending', 478572), (5265, '2024-12-23 12:00:00.000', 'pending', 355376), (5265, '2024-12-23 12:00:00.000', 'pending', 494923), (5265, '2024-12-23 12:00:00.000', 'pending', 486438), (5265, '2024-12-23 12:00:00.000', 'pending', 350067), (5265, '2024-12-23 12:00:00.000', 'pending', 500705), (5265, '2024-12-23 12:00:00.000', 'pending', 495404), (5265, '2024-12-23 12:00:00.000', 'pending', 480078), (5265, '2024-12-23 12:00:00.000', 'pending', 494392), (5265, '2024-12-23 12:00:00.000', 'pending', 477510), (5265, '2024-12-23 12:00:00.000', 'pending', 480930), (5265, '2024-12-23 12:00:00.000', 'pending', 488294), (5265, '2024-12-23 12:00:00.000', 'pending', 500331), (5265, '2024-12-23 12:00:00.000', 'pending', 464658), (5265, '2024-12-23 12:00:00.000', 'pending', 474791), (5265, '2024-12-23 12:00:00.000', 'pending', 477931), (5265, '2024-12-23 12:00:00.000', 'pending', 366158), (5265, '2024-12-23 12:00:00.000', 'pending', 501462), (5265, '2024-12-23 12:00:00.000', 'pending', 474161), (5265, '2024-12-23 12:00:00.000', 'pending', 503019), (5265, '2024-12-23 12:00:00.000', 'pending', 503150), (

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 49 page no 124772 n bits 304 index PRIMARY of table parcelvoy.campaign_sends trx id 1510867228 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 49 page no 124772 n bits 304 index PRIMARY of table parcelvoy.campaign_sends trx id 1510867228 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;


*** (2) TRANSACTION:
TRANSACTION 1510867229, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 8 lock struct(s), heap size 1128, 5 row lock(s)
MySQL thread id 820958, OS thread handle 22863076603648, query id 5196453106 172.30.2.161 overwatch update
insert into campaign_sends (campaign_id, send_at, state, user_id) values (5249, '2024-12-22 12:00:00.000', 'pending', 491018), (5249, '2024-12-22 12:00:00.000', 'pending', 328431), (5249, '2024-12-22 12:00:00.000', 'pending', 507000), (5249, '2024-12-22 12:00:00.000', 'pending', 482231), (5249, '2024-12-22 12:00:00.000', 'pending', 361739), (5249, '2024-12-22 12:00:00.000', 'pending', 437863), (5249, '2024-12-22 12:00:00.000', 'pending', 335319), (5249, '2024-12-22 12:00:00.000', 'pending', 490655), (5249, '2024-12-22 12:00:00.000', 'pending', 354827), (5249, '2024-12-22 12:00:00.000', 'pending', 499292), (5249, '2024-12-22 12:00:00.000', 'pending', 491808), (5249, '2024-12-22 12:00:00.000', 'pending', 362232), (5249, '2024-12-22 12:00:00.000', 'pending', 366164), (5249, '2024-12-22 12:00:00.000', 'pending', 366907), (5249, '2024-12-22 12:00:00.000', 'pending', 500034), (5249, '2024-12-22 12:00:00.000', 'pending', 507295), (5249, '2024-12-22 12:00:00.000', 'pending', 422248), (5249, '2024-12-22 12:00:00.000', 'pending', 424598), (5249, '2024-12-22 12:00:00.000', 'pending', 349889), (5249, '2024-12-22 12:00:00.000', 'pending', 494496), (5249, '2024-12-22 12:00:00.000', 'pending', 374607), (5249, '2024-12-22 12:00:00.000', 'pending', 479655), (5249, '2024-12-22 12:00:00.000', 'pending', 490306), (5249, '2024-12-22 12:00:00.000', 'pending', 507729), (5249, '2024-12-22 12:00:00.000', 'pending', 506148), (5249, '2024-12-22 12:00:00.000', 'pending', 508027), (5249, '2024-12-22 12:00:00.000', 'pending', 507933), (5249, '2024-12-22 12:00:00.000', 'pending', 488541), (5249, '2024-12-22 12:00:00.000', 'pending', 508155), (5249, '2024-12-22 12:00:00.000', 'pending', 404523), (5249, '2024-12-22 12:00:00.000', 'pending', 506737), (5249, '2024-12-22 12:00:00.000', 'pending', 482630), (5249, '2024-12-22 12:00:00.000', 'pending', 399965), (5249, '2024-12-22 12:00:00.000', 'pending', 508744), (5249, '2024-12-22 12:00:00.000', 'pending', 375879), (5249, '2024-12-22 12:00:00.000', 'pending', 507487), (5249, '2024-12-22 12:00:00.000', 'pending', 507857), (5249, '2024-12-22 12:00:00.000', 'pending', 490066), (5249, '2024-12-22 12:00:00.000', 'pending', 507881), (5249, '2024-12-22 12:00:00.000', 'pending', 508720), (5249, '2024-12-22 12:00:00.000', 'pending', 321367), (5249, '2024-12-22 12:00:00.000', 'pending', 495303), (5249, '2024-12-22 12:00:00.000', 'pending', 509026), (5249, '2024-12-22 12:00:00.000', 'pending', 365243), (5249, '2024-12-22 12:00:00.000', 'pending', 509069), (5249, '2024-12-22 12:00:00.000', 'pending', 353389), (5249, '2024-12-22 12:00:00.000', 'pending', 338448), (5249, '2024-12-22 12:00:00.000', 'pending', 485621), (5249, '2024-12-22 12:00:00.000', 'pending', 509218), (5249, '2024-12-22 12:00:00.000', 'pending', 250077), (5249, '2024-12-22 12:00:00.000', 'pending', 351554), (5249, '2024-12-22 12:00:00.000', 'pending', 491462), (5249, '2024-12-22 12:00:00.000', 'pending', 370454), (5249, '2024-12-22 12:00:00.000', 'pending', 495046), (

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 49 page no 124772 n bits 304 index PRIMARY of table parcelvoy.campaign_sends trx id 1510867229 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 49 page no 124772 n bits 304 index PRIMARY of table parcelvoy.campaign_sends trx id 1510867229 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (2)
@pushchris
Copy link
Contributor

Can you get me what MySQL engine version you are using?

@leobarcellos
Copy link
Contributor Author

Can you get me what MySQL engine version you are using?

Sure!
Engine version: 8.0.39

It's on AWS RDS, Community MySQL.

@leobarcellos
Copy link
Contributor Author

SHOW VARIABLES; output

Variable_name Value
activate_all_roles_on_login OFF
admin_address
admin_port 33062
admin_ssl_ca
admin_ssl_capath
admin_ssl_cert
admin_ssl_cipher
admin_ssl_crl
admin_ssl_crlpath
admin_ssl_key
admin_tls_ciphersuites
admin_tls_version TLSv1.2,TLSv1.3
authentication_policy *
auto_generate_certs ON
auto_increment_increment 1
auto_increment_offset 1
autocommit ON
automatic_sp_privileges ON
avoid_temporal_upgrade OFF
back_log 1300
basedir /rdsdbbin/mysql-8.0.39.R1/
big_tables OFF
bind_address *
binlog_cache_size 32768
binlog_checksum CRC32
binlog_direct_non_transactional_updates OFF
binlog_encryption OFF
binlog_error_action ABORT_SERVER
binlog_expire_logs_auto_purge ON
binlog_expire_logs_seconds 2592000
binlog_format MIXED
binlog_group_commit_sync_delay 0
binlog_group_commit_sync_no_delay_count 0
binlog_gtid_simple_recovery ON
binlog_max_flush_queue_time 0
binlog_order_commits ON
binlog_rotate_encryption_master_key_at_startup OFF
binlog_row_event_max_size 8192
binlog_row_image FULL
binlog_row_metadata MINIMAL
binlog_row_value_options
binlog_rows_query_log_events OFF
binlog_stmt_cache_size 32768
binlog_transaction_compression OFF
binlog_transaction_compression_level_zstd 3
binlog_transaction_dependency_history_size 25000
binlog_transaction_dependency_tracking COMMIT_ORDER
block_encryption_mode aes-128-ecb
bulk_insert_buffer_size 8388608
caching_sha2_password_auto_generate_rsa_keys ON
caching_sha2_password_digest_rounds 5000
caching_sha2_password_private_key_path private_key.pem
caching_sha2_password_public_key_path public_key.pem
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8mb4
character_set_filesystem binary
character_set_results
character_set_server utf8mb4
character_set_system utf8mb3
character_sets_dir /rdsdbbin/mysql-8.0.39.R1/share/charsets/
check_proxy_users OFF
collation_connection utf8mb4_0900_ai_ci
collation_database utf8mb4_0900_ai_ci
collation_server utf8mb4_0900_ai_ci
completion_type NO_CHAIN
concurrent_insert AUTO
connect_timeout 10
connection_memory_chunk_size 8192
connection_memory_limit 18446744073709551615
core_file OFF
create_admin_listener_thread OFF
cte_max_recursion_depth 1000
datadir /rdsdbdata/db/
default_authentication_plugin mysql_native_password
default_collation_for_utf8mb4 utf8mb4_0900_ai_ci
default_password_lifetime 0
default_storage_engine InnoDB
default_table_encryption OFF
default_tmp_storage_engine InnoDB
default_week_format 0
delay_key_write ON
delayed_insert_limit 100
delayed_insert_timeout 300
delayed_queue_size 1000
disabled_storage_engines
disconnect_on_expired_password ON
div_precision_increment 4
end_markers_in_json OFF
enforce_gtid_consistency OFF
eq_range_index_dive_limit 200
error_count 0
event_scheduler ON
expire_logs_days 0
explain_format TRADITIONAL
explicit_defaults_for_timestamp ON
external_user
flush OFF
flush_time 0
foreign_key_checks ON
ft_boolean_syntax + -><()~*:""&|
ft_max_word_len 84
ft_min_word_len 4
ft_query_expansion_limit 20
ft_stopword_file (built-in)
general_log OFF
general_log_file /rdsdbdata/log/general/mysql-general.log
generated_random_password_length 20
global_connection_memory_limit 18446744073709551615
global_connection_memory_tracking OFF
group_concat_max_len 1024
group_replication_consistency EVENTUAL
gtid_executed
gtid_executed_compression_period 0
gtid_mode OFF_PERMISSIVE
gtid_next AUTOMATIC
gtid_owned
gtid_purged
have_compress YES
have_dynamic_loading YES
have_geometry YES
have_openssl YES
have_profiling YES
have_query_cache NO
have_rtree_keys YES
have_ssl YES
have_statement_timeout YES
have_symlink DISABLED
histogram_generation_max_mem_size 20000000
host_cache_size 668
hostname ip-10-1-0-102
identity 0
immediate_server_version 999999
information_schema_stats_expiry 86400
init_connect
init_file
init_replica
init_slave
innodb_adaptive_flushing ON
innodb_adaptive_flushing_lwm 10
innodb_adaptive_hash_index ON
innodb_adaptive_hash_index_parts 8
innodb_adaptive_max_sleep_delay 150000
innodb_api_bk_commit_interval 5
innodb_api_disable_rowlock OFF
innodb_api_enable_binlog OFF
innodb_api_enable_mdl OFF
innodb_api_trx_level 0
innodb_autoextend_increment 64
innodb_autoinc_lock_mode 2
innodb_buffer_pool_chunk_size 134217728
innodb_buffer_pool_dump_at_shutdown ON
innodb_buffer_pool_dump_now OFF
innodb_buffer_pool_dump_pct 25
innodb_buffer_pool_filename ib_buffer_pool
innodb_buffer_pool_in_core_file ON
innodb_buffer_pool_instances 8
innodb_buffer_pool_load_abort OFF
innodb_buffer_pool_load_at_startup ON
innodb_buffer_pool_load_now OFF
innodb_buffer_pool_size 11811160064
innodb_change_buffer_max_size 50
innodb_change_buffering all
innodb_checksum_algorithm crc32
innodb_cmp_per_index_enabled OFF
innodb_commit_concurrency 0
innodb_compression_failure_threshold_pct 5
innodb_compression_level 6
innodb_compression_pad_pct_max 50
innodb_concurrency_tickets 5000
innodb_data_file_path ibdata1:12M:autoextend
innodb_data_home_dir /rdsdbdata/db/innodb
innodb_ddl_buffer_size 1048576
innodb_ddl_threads 4
innodb_deadlock_detect ON
innodb_dedicated_server OFF
innodb_default_row_format dynamic
innodb_directories
innodb_disable_sort_file_cache OFF
innodb_doublewrite ON
innodb_doublewrite_batch_size 16
innodb_doublewrite_dir
innodb_doublewrite_files 2
innodb_doublewrite_pages 32
innodb_extend_and_initialize ON
innodb_fast_shutdown 1
innodb_file_per_table ON
innodb_fill_factor 100
innodb_flush_log_at_timeout 1
innodb_flush_log_at_trx_commit 2
innodb_flush_method O_DIRECT
innodb_flush_neighbors 0
innodb_flush_sync ON
innodb_flushing_avg_loops 30
innodb_force_load_corrupted OFF
innodb_force_recovery 0
innodb_fsync_threshold 0
innodb_ft_aux_table
innodb_ft_cache_size 8000000
innodb_ft_enable_diag_print OFF
innodb_ft_enable_stopword ON
innodb_ft_max_token_size 84
innodb_ft_min_token_size 3
innodb_ft_num_word_optimize 2000
innodb_ft_result_cache_limit 2000000000
innodb_ft_server_stopword_table
innodb_ft_sort_pll_degree 2
innodb_ft_total_cache_size 640000000
innodb_ft_user_stopword_table
innodb_idle_flush_pct 100
innodb_io_capacity 200
innodb_io_capacity_max 2000
innodb_lock_wait_timeout 120
innodb_log_buffer_size 8388608
innodb_log_checksums ON
innodb_log_compressed_pages ON
innodb_log_file_size 134217728
innodb_log_files_in_group 2
innodb_log_group_home_dir /rdsdbdata/log/innodb
innodb_log_spin_cpu_abs_lwm 80
innodb_log_spin_cpu_pct_hwm 50
innodb_log_wait_for_flush_spin_hwm 400
innodb_log_write_ahead_size 8192
innodb_log_writer_threads ON
innodb_lru_scan_depth 1024
innodb_max_dirty_pages_pct 90.000000
innodb_max_dirty_pages_pct_lwm 10.000000
innodb_max_purge_lag 0
innodb_max_purge_lag_delay 0
innodb_max_undo_log_size 1073741824
innodb_monitor_disable
innodb_monitor_enable
innodb_monitor_reset
innodb_monitor_reset_all
innodb_numa_interleave OFF
innodb_old_blocks_pct 37
innodb_old_blocks_time 1000
innodb_online_alter_log_max_size 134217728
innodb_open_files 4000
innodb_optimize_fulltext_only OFF
innodb_page_cleaners 4
innodb_page_size 16384
innodb_parallel_read_threads 4
innodb_print_all_deadlocks ON
innodb_print_ddl_logs OFF
innodb_purge_batch_size 300
innodb_purge_rseg_truncate_frequency 128
innodb_purge_threads 1
innodb_random_read_ahead OFF
innodb_read_ahead_threshold 56
innodb_read_io_threads 4
innodb_read_only OFF
innodb_redo_log_archive_dirs
innodb_redo_log_capacity 2147483648
innodb_redo_log_encrypt OFF
innodb_replication_delay 0
innodb_rollback_on_timeout OFF
innodb_rollback_segments 128
innodb_segment_reserve_factor 12.500000
innodb_sort_buffer_size 1048576
innodb_spin_wait_delay 6
innodb_spin_wait_pause_multiplier 50
innodb_stats_auto_recalc ON
innodb_stats_include_delete_marked OFF
innodb_stats_method nulls_equal
innodb_stats_on_metadata OFF
innodb_stats_persistent ON
innodb_stats_persistent_sample_pages 20
innodb_stats_transient_sample_pages 8
innodb_status_output OFF
innodb_status_output_locks OFF
innodb_strict_mode ON
innodb_sync_array_size 1
innodb_sync_spin_loops 30
innodb_table_locks ON
innodb_temp_data_file_path ibtmp1:12M:autoextend
innodb_temp_tablespaces_dir ./#innodb_temp/
innodb_thread_concurrency 0
innodb_thread_sleep_delay 10000
innodb_tmpdir
innodb_undo_directory ./
innodb_undo_log_encrypt OFF
innodb_undo_log_truncate ON
innodb_undo_tablespaces 2
innodb_use_fdatasync ON
innodb_use_native_aio ON
innodb_validate_tablespace_paths ON
innodb_version 8.0.39
innodb_write_io_threads 4
insert_id 0
interactive_timeout 28800
internal_tmp_mem_storage_engine TempTable
join_buffer_size 262144
keep_files_on_create OFF
key_buffer_size 16777216
key_cache_age_threshold 300
key_cache_block_size 1024
key_cache_division_limit 100
keyring_operations ON
large_files_support ON
large_page_size 0
large_pages OFF
last_insert_id 0
lc_messages en_US
lc_messages_dir /rdsdbbin/mysql-8.0.39.R1/share/
lc_time_names en_US
license GPL
local_infile ON
lock_wait_timeout 31536000
locked_in_memory OFF
log_bin ON
log_bin_basename /rdsdbdata/log/binlog/mysql-bin-changelog
log_bin_index /rdsdbdata/log/binlog/mysql-bin-changelog.index
log_bin_trust_function_creators OFF
log_bin_use_v1_row_events OFF
log_error /rdsdbdata/log/error/mysql-error.log
log_error_services log_filter_internal; log_sink_internal
log_error_suppression_list MY-013360
log_error_verbosity 2
log_output TABLE
log_queries_not_using_indexes OFF
log_raw OFF
log_replica_updates ON
log_slave_updates ON
log_slow_admin_statements OFF
log_slow_extra OFF
log_slow_replica_statements OFF
log_slow_slave_statements OFF
log_statements_unsafe_for_binlog OFF
log_throttle_queries_not_using_indexes 0
log_timestamps UTC
long_query_time 10.000000
low_priority_updates OFF
lower_case_file_system OFF
lower_case_table_names 0
mandatory_roles
master_info_repository TABLE
master_verify_checksum OFF
max_allowed_packet 67108864
max_binlog_cache_size 18446744073709547520
max_binlog_size 134217728
max_binlog_stmt_cache_size 18446744073709547520
max_connect_errors 100
max_connections 1300
max_delayed_threads 20
max_digest_length 1024
max_error_count 1024
max_execution_time 0
max_heap_table_size 16777216
max_insert_delayed_threads 20
max_join_size 18446744073709551615
max_length_for_sort_data 4096
max_points_in_geometry 65536
max_prepared_stmt_count 16382
max_relay_log_size 0
max_seeks_for_key 18446744073709551615
max_sort_length 1024
max_sp_recursion_depth 0
max_user_connections 0
max_write_lock_count 18446744073709551615
min_examined_row_limit 0
myisam_data_pointer_size 6
myisam_max_sort_file_size 9223372036853727232
myisam_mmap_size 18446744073709551615
myisam_recover_options OFF
myisam_sort_buffer_size 8388608
myisam_stats_method nulls_unequal
myisam_use_mmap OFF
mysql_native_password_proxy_users OFF
net_buffer_length 16384
net_read_timeout 30
net_retry_count 10
net_write_timeout 60
new OFF
ngram_token_size 2
offline_mode OFF
old OFF
old_alter_table OFF
open_files_limit 1048576
optimizer_max_subgraph_pairs 100000
optimizer_prune_level 1
optimizer_search_depth 62
optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
optimizer_trace enabled=off,one_line=off
optimizer_trace_features greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on
optimizer_trace_limit 1
optimizer_trace_max_mem_size 1048576
optimizer_trace_offset -1
original_commit_timestamp 36028797018963968
original_server_version 999999
parser_max_mem_size 18446744073709551615
partial_revokes OFF
password_history 0
password_require_current OFF
password_reuse_interval 0
performance_schema ON
performance_schema_accounts_size -1
performance_schema_digests_size 10000
performance_schema_error_size 5314
performance_schema_events_stages_history_long_size 10000
performance_schema_events_stages_history_size 10
performance_schema_events_statements_history_long_size 10000
performance_schema_events_statements_history_size 10
performance_schema_events_transactions_history_long_size 10000
performance_schema_events_transactions_history_size 10
performance_schema_events_waits_history_long_size 10000
performance_schema_events_waits_history_size 10
performance_schema_hosts_size -1
performance_schema_max_cond_classes 150
performance_schema_max_cond_instances -1
performance_schema_max_digest_length 1024
performance_schema_max_digest_sample_age 60
performance_schema_max_file_classes 80
performance_schema_max_file_handles 32768
performance_schema_max_file_instances -1
performance_schema_max_index_stat -1
performance_schema_max_memory_classes 450
performance_schema_max_metadata_locks -1
performance_schema_max_mutex_classes 350
performance_schema_max_mutex_instances -1
performance_schema_max_prepared_statements_instances -1
performance_schema_max_program_instances -1
performance_schema_max_rwlock_classes 60
performance_schema_max_rwlock_instances -1
performance_schema_max_socket_classes 10
performance_schema_max_socket_instances -1
performance_schema_max_sql_text_length 1024
performance_schema_max_stage_classes 175
performance_schema_max_statement_classes 219
performance_schema_max_statement_stack 10
performance_schema_max_table_handles -1
performance_schema_max_table_instances -1
performance_schema_max_table_lock_stat -1
performance_schema_max_thread_classes 100
performance_schema_max_thread_instances -1
performance_schema_session_connect_attrs_size 512
performance_schema_setup_actors_size -1
performance_schema_setup_objects_size -1
performance_schema_show_processlist OFF
performance_schema_users_size -1
persist_only_admin_x509_subject
persist_sensitive_variables_in_plaintext ON
persisted_globals_load OFF
pid_file /rdsdbdata/log/mysql-3306.pid
plugin_dir /rdsdbbin/mysql-8.0.39.R1/lib/plugin/
port 3306
preload_buffer_size 32768
print_identified_with_as_hex OFF
profiling OFF
profiling_history_size 15
protocol_compression_algorithms zlib,zstd,uncompressed
protocol_version 10
proxy_user
pseudo_replica_mode OFF
pseudo_slave_mode OFF
pseudo_thread_id 825240
query_alloc_block_size 8192
query_prealloc_size 8192
rand_seed1 0
rand_seed2 0
range_alloc_block_size 4096
range_optimizer_max_mem_size 8388608
rbr_exec_mode STRICT
read_buffer_size 262144
read_only OFF
read_rnd_buffer_size 524288
regexp_stack_limit 8000000
regexp_time_limit 32
relay_log /rdsdbdata/log/relaylog/relaylog
relay_log_basename /rdsdbdata/log/relaylog/relaylog
relay_log_index /rdsdbdata/log/relaylog/relaylog.index
relay_log_info_file relay-log.info
relay_log_info_repository TABLE
relay_log_purge ON
relay_log_recovery ON
relay_log_space_limit 0
replica_allow_batching ON
replica_checkpoint_group 512
replica_checkpoint_period 300
replica_compressed_protocol OFF
replica_exec_mode IDEMPOTENT
replica_load_tmpdir /rdsdbdata/tmp
replica_max_allowed_packet 1073741824
replica_net_timeout 60
replica_parallel_type LOGICAL_CLOCK
replica_parallel_workers 4
replica_pending_jobs_size_max 134217728
replica_preserve_commit_order ON
replica_skip_errors OFF
replica_sql_verify_checksum ON
replica_transaction_retries 10
replica_type_conversions
replication_optimize_for_static_plugin_config OFF
replication_sender_observe_commit_only OFF
report_host
report_password
report_port 3306
report_user
require_row_format OFF
require_secure_transport OFF
resultset_metadata FULL
rpl_read_size 8192
rpl_stop_replica_timeout 31536000
rpl_stop_slave_timeout 31536000
schema_definition_cache 256
secondary_engine_cost_threshold 100000.000000
secure_file_priv /secure_file_priv_dir/
select_into_buffer_size 131072
select_into_disk_sync OFF
select_into_disk_sync_delay 0
server_id 1342268138
server_id_bits 32
server_uuid 4282e0f1-a1ff-11ef-8113-12a4b0e41567
session_track_gtids OFF
session_track_schema ON
session_track_state_change OFF
session_track_system_variables time_zone,autocommit,character_set_client,character_set_results,character_set_connection
session_track_transaction_info OFF
sha256_password_auto_generate_rsa_keys ON
sha256_password_private_key_path private_key.pem
sha256_password_proxy_users OFF
sha256_password_public_key_path public_key.pem
show_create_table_skip_secondary_engine OFF
show_create_table_verbosity OFF
show_gipk_in_create_table_and_information_schema ON
show_old_temporals OFF
skip_external_locking ON
skip_name_resolve OFF
skip_networking OFF
skip_replica_start ON
skip_show_database OFF
skip_slave_start ON
slave_allow_batching ON
slave_checkpoint_group 512
slave_checkpoint_period 300
slave_compressed_protocol OFF
slave_exec_mode IDEMPOTENT
slave_load_tmpdir /rdsdbdata/tmp
slave_max_allowed_packet 1073741824
slave_net_timeout 60
slave_parallel_type LOGICAL_CLOCK
slave_parallel_workers 4
slave_pending_jobs_size_max 134217728
slave_preserve_commit_order ON
slave_rows_search_algorithms INDEX_SCAN,HASH_SCAN
slave_skip_errors OFF
slave_sql_verify_checksum ON
slave_transaction_retries 10
slave_type_conversions
slow_launch_time 2
slow_query_log OFF
slow_query_log_file /rdsdbdata/log/slowquery/mysql-slowquery.log
socket /tmp/mysql.sock
sort_buffer_size 262144
source_verify_checksum OFF
sql_auto_is_null OFF
sql_big_selects ON
sql_buffer_result OFF
sql_generate_invisible_primary_key OFF
sql_log_bin ON
sql_log_off OFF
sql_mode STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION
sql_notes ON
sql_quote_show_create ON
sql_replica_skip_counter 0
sql_require_primary_key OFF
sql_safe_updates OFF
sql_select_limit 18446744073709551615
sql_slave_skip_counter 0
sql_warnings OFF
ssl_ca /rdsdbdata/rds-metadata/ca-cert.pem
ssl_capath
ssl_cert /rdsdbdata/rds-metadata/server-cert.pem
ssl_cipher
ssl_crl
ssl_crlpath
ssl_fips_mode OFF
ssl_key /rdsdbdata/rds-metadata/server-key.pem
ssl_session_cache_mode ON
ssl_session_cache_timeout 300
statement_id 5394927486
stored_program_cache 256
stored_program_definition_cache 256
super_read_only OFF
sync_binlog 1
sync_master_info 10000
sync_relay_log 10000
sync_relay_log_info 10000
sync_source_info 10000
system_time_zone UTC
table_definition_cache 2000
table_encryption_privilege_check OFF
table_open_cache 4000
table_open_cache_instances 16
tablespace_definition_cache 256
temptable_max_mmap 1073741824
temptable_max_ram 1073741824
temptable_use_mmap ON
terminology_use_previous BEFORE_8_0_26
thread_cache_size 21
thread_handling one-thread-per-connection
thread_stack 262144
time_zone UTC
timestamp 1734711364.329137
tls_ciphersuites
tls_version TLSv1.2,TLSv1.3
tmp_table_size 16777216
tmpdir /rdsdbdata/tmp
transaction_alloc_block_size 8192
transaction_allow_batching OFF
transaction_isolation REPEATABLE-READ
transaction_prealloc_size 4096
transaction_read_only OFF
transaction_write_set_extraction XXHASH64
unique_checks ON
updatable_views_with_limit YES
use_secondary_engine ON
version 8.0.39
version_comment Source distribution
version_compile_machine x86_64
version_compile_os Linux
version_compile_zlib 1.3.1
wait_timeout 28800
warning_count 0
windowing_use_high_precision ON
xa_detach_on_prepare ON

@leobarcellos
Copy link
Contributor Author

@pushchris Since I'm no specialist on databases, I'm asking for help with ChatGPT. This recommendation seems like a nice one, but since you do know the system better, I will wait for your input:


# 2. Use (type, delay_until, journey_id) Composite Index

You already have an index on (type, delay_until), but the queries in your deadlocks also filter on journey_id. Adding journey_id to that index can help MySQL access only the relevant rows more selectively—and lock fewer rows overall. For example:

ALTER TABLE journey_user_step
  DROP INDEX journey_user_step_type_delay_until_index,
  ADD INDEX journey_user_step_type_delay_until_jid_index (type, delay_until, journey_id);

This ensures your large updates will lock only the rows relevant to that journey, rather than scanning or locking pages for other journey_id values that happen to share the same type/delay_until range.


Also, there this recommendation as well, but this one I never heard of:


4. Check/Adjust Transaction Isolation Level

If your use-case can tolerate a slightly less strict isolation, switching from the default REPEATABLE-READ to READ-COMMITTED can reduce locking conflicts—especially for updates involving large row sets. In many high-throughput systems, READ COMMITTED is used to lessen concurrency overhead. Example:

SET GLOBAL transaction_isolation='READ-COMMITTED';
-- or for a specific session
SET SESSION transaction_isolation='READ-COMMITTED';

Evaluate carefully whether this is acceptable for your business logic.


The majority of deadlocks and wait lock are related to the campaign_sends table and the journey_user_step.
I guess the campaign send one is related to the stalled campaigns that it updates to 'failed'.

@pushchris
Copy link
Contributor

The first suggestion shouldn't be applicable, there is already an index on journey_id from the foreign key which is why it's not a part of the composite index. Double check to make sure it's in your DB, but it gets added in 20220730041531_create_journeys.js.

The campaign send locks you've posted above are all related to generating the initial list of users to send to (they are all inserts vs updates) and are gap locks. In general deadlocks are normal part of life in MySQL but in this case they aren't retrying because they are a part of that list generation and interrupt the flow it seems like. Ultimate the likelihood of a deadlock during insertion can be reduced by removing the duplicate entry logic, but that then opens a can of worms of duplicative sends being possible and would interrupt the batch inserting. Will explore if there are better options available

@pushchris
Copy link
Contributor

Give #589 a try, its a bit of a shot in the dark since MySQL does some weird things but the theory is that MySQL is having to gap lock the range that it is inserting to handle duplicates. Changing the isolation level isn't really an option since its at the connection level and that would affect every other query as well

@leobarcellos
Copy link
Contributor Author

Gonna test it out! Thanks!

@leobarcellos
Copy link
Contributor Author

I went ahead and changed the index like this below, adding a new one with all three.

ALTER TABLE journey_user_step
  DROP INDEX journey_user_step_type_delay_until_index,
  ADD INDEX journey_user_step_jid_type_delay_idx
      (journey_id, type, delay_until);

It really helped on the deadlocks on this journey_user_step. Below are two screenshots, first one before the update and second one after the table structure update. Basically it fixed the long wait time on the journey_user_step update.

Screenshot 2024-12-27 at 12 52 02 PM Screenshot 2024-12-27 at 12 52 33 PM

Related to #541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants