-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? #584
Comments
Can you get me what MySQL engine version you are using? |
Sure! It's on AWS RDS, Community MySQL. |
|
@pushchris Since I'm no specialist on databases, I'm asking for help with ChatGPT. This recommendation seems like a nice one, but since you do know the system better, I will wait for your input: # 2. Use (type, delay_until, journey_id) Composite IndexYou already have an index on (type, delay_until), but the queries in your deadlocks also filter on journey_id. Adding journey_id to that index can help MySQL access only the relevant rows more selectively—and lock fewer rows overall. For example: ALTER TABLE journey_user_step
DROP INDEX journey_user_step_type_delay_until_index,
ADD INDEX journey_user_step_type_delay_until_jid_index (type, delay_until, journey_id); This ensures your large updates will lock only the rows relevant to that journey, rather than scanning or locking pages for other journey_id values that happen to share the same type/delay_until range. Also, there this recommendation as well, but this one I never heard of: 4. Check/Adjust Transaction Isolation LevelIf your use-case can tolerate a slightly less strict isolation, switching from the default REPEATABLE-READ to READ-COMMITTED can reduce locking conflicts—especially for updates involving large row sets. In many high-throughput systems, READ COMMITTED is used to lessen concurrency overhead. Example: SET GLOBAL transaction_isolation='READ-COMMITTED';
-- or for a specific session
SET SESSION transaction_isolation='READ-COMMITTED'; Evaluate carefully whether this is acceptable for your business logic. The majority of deadlocks and wait lock are related to the |
The first suggestion shouldn't be applicable, there is already an index on The campaign send locks you've posted above are all related to generating the initial list of users to send to (they are all inserts vs updates) and are gap locks. In general deadlocks are normal part of life in MySQL but in this case they aren't retrying because they are a part of that list generation and interrupt the flow it seems like. Ultimate the likelihood of a deadlock during insertion can be reduced by removing the duplicate entry logic, but that then opens a can of worms of duplicative sends being possible and would interrupt the batch inserting. Will explore if there are better options available |
Give #589 a try, its a bit of a shot in the dark since MySQL does some weird things but the theory is that MySQL is having to gap lock the range that it is inserting to handle duplicates. Changing the isolation level isn't really an option since its at the connection level and that would affect every other query as well |
Gonna test it out! Thanks! |
I went ahead and changed the index like this below, adding a new one with all three.
It really helped on the deadlocks on this journey_user_step. Below are two screenshots, first one before the update and second one after the table structure update. Basically it fixed the long wait time on the journey_user_step update. Related to #541 |
Not sure if related to #583, but we are facing some issues that are causing real trouble here.
Our workers when receive this KnexTimeoutError error, they just stall and keep at 0% cpu % and everything just freezes.
When I restart the workers, they come back but eventually it pops this error and everything happen again.
I even tried reducing the 6 workers to just 1 with more cpu and memory usage, but it happened just like before.
Things I already tried:
Log:
KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n at Client_MySQL2.acquireConnection (/usr/src/app/node_modules/knex/lib/client.js:332:26)\n at async Runner.ensureConnection (/usr/src/app/node_modules/knex/lib/execution/runner.js:305:28)\n at async Runner.run (/usr/src/app/node_modules/knex/lib/execution/runner.js:30:19)\n at async user_id.user_id [as callback] (/usr/src/app/build/campaigns/CampaignService.js:262:9)\n at async Chunker.flush (/usr/src/app/build/utilities/index.js:242:13)\n at async chunk (/usr/src/app/build/utilities/index.js:221:5)\n at async generateSendList (/usr/src/app/build/campaigns/CampaignService.js:261:5)\n at async handler (/usr/src/app/build/campaigns/CampaignGenerateListJob.js:33:9)\n at async Queue.dequeue (/usr/src/app/build/queue/Queue.js:59:9)\n at async worker.bullmq_1.Worker.connection [as processFn] (/usr/src/app/build/queue/RedisQueueProvider.js:78:13)\n at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:455:28)\n at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:640:24)"},"stacktrace":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n at createQueryBuilder (/usr/src/app/node_modules/knex/lib/knex-builder/make-knex.js:320:26)\n at knex (/usr/src/app/node_modules/knex/lib/knex-builder/make-knex.js:101:12)\n at CampaignSend.table (/usr/src/app/build/core/Model.js:245:16)\n at CampaignSend.query (/usr/src/app/build/core/Model.js:59:21)\n at user_id.user_id [as callback] (/usr/src/app/build/campaigns/CampaignService.js:262:39)\n at Chunker.flush (/usr/src/app/build/utilities/index.js:242:24)\n at chunk (/usr/src/app/build/utilities/index.js:221:19)\n at processTicksAndRejections (node:internal/process/task_queues:95:5)\n at runNextTicks (node:internal/process/task_queues:64:3)\n at listOnTimeout (node:internal/timers:538:9)\n at process.processTimers (node:internal/timers:512:7)\n at async generateSendList (/usr/src/app/build/campaigns/CampaignService.js:261:5)","job":{"name":"campaign_generate_list_job"
More logs:
And also, lock waits increased significantly, which is related to #541
The text was updated successfully, but these errors were encountered: