-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exploration to reduce deadlocks #589
base: main
Are you sure you want to change the base?
Conversation
@pushchris I tried this PR but I also merged some other PR's (#585, #586, #587, #588) before updating. On performance view, the queue number was getting higher and higher: I tried modifying redis concurrency (10 -> 25) but with no success. Then, I reverted our stack to the last PR before update (#581). Weirdly the queue task dropped to 3k, not sure what happened, queue throughput remained the same. (weird, maybe it was a bug on the performance view) Really not sure what happened. Btw, I will increase servers capacity (ec2 and rds) later to see if it will mitigate some of these errors, ec2 and rds instance are floating around 80% cpu capacity |
@leobarcellos my guess is that the list evaluation queue is literally never getting handled due to new priorities and your workers never getting to an empty queue state to handle. I've reverted that in this PR, the more I think about it the less having different priorities seems to be helpful and will mostly just cause issues. Can potentially re-evaluate moving everything to a lower priority queue and then selectively moving to high priority but even still that could cause lots of issues that are hard for folks to immediately identify. See if the latest commit to this PR stops the issue from happening for you |
@pushchris Thanks! Updated it ~20min ago, looks good to me: --- by the way, last deadlock still was on dec 25 00h, its 60hrs+ without new deadlocks. But to be honest I don't know if it's 100% related with this PR, since I went back and forth with this PR and still hadn't a new deadlock. I think it was the parameter tweaking and the index update on journey_user_stat |
So bizarre on that index. In theory there is already an index on |
@leobarcellos how have things been looking? Still no deadlocks since that addition? |
@pushchris Hey there! Thanks for the update. So, no more deadlocks found on journey_user_stat. Since then I received like 3 or 4 deadlock errors on campaign_sends, like these below. -- Which I think it's fine. I saw that you updated the failedStalled to update on chunks of 25, is the insertion doing in chunks as well? About journeys, well, we do have a lot since there are many projects. Right now we have 165 journeys that are published and dont have the "deleted_at" column set. And the journey_user_step auto increment is at 47M and data length at 7.2G -- Would it be a good practice to have any sort of cleanup?
|
Simple change to potentially reduce gap locks by not having to re-select records for update