-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: auto-inc lock deadlocks? #27
Comments
Ah, these two solutions didn't seem to work for me:
|
We haven't encountered any deadlock scenarios so far. In case your assumption is correct that this is related to auto increment locks, it's likely due to the fact that we run our mysql cluster with Are you able to reproduce this scenario? Can you think about any changes to generated queries in order to avoid such deadlock scenarios? |
Ah, that's interesting that you guys set I haven't been able to reproduce the scenario at all :\ It's always happened on a specific large table, and only near the end of the migration. I initially thought it might have something to do with those last few chunks being written to frequently because they're newer, but it turns out it's choking on records that are a couple hours old (and probably not being touched often during the migration). I like the idea of documenting this, although it seems like we should hold off until someone else has the same problem. LHM has worked great otherwise, and I haven't found anything online from people with a similar issue yet. |
Any updates on this? I'm experiencing the same error: https://app.getsentry.com/playlist/production/group/17127363/. It seems fairly reproducible. I have |
Since Sentry seems to be changing that error page, this was the error:
https://app.getsentry.com/playlist/production/group/17127363/events/951399630/ |
@JacobWG we dropped the table on which this was happening, but haven't seen it since then [yet]. |
Dropped and readded, or the table's no longer necessary to your app? Like, did you fix the deadlock somehow or did it just become irrelevant? |
Nah, the table became irrelevant so we just got rid of it. |
Okay - for now, I'm experimenting with https://github.com/qertoip/transaction_retry, so we'll see how far that gets. |
GL, hope it works well! That reminds me, we did end up monkey-patching LHM to retry deadlocks around the same time, but I'm not sure how often it's worked since then (this is also rescuing a different error than the gem you mentioned):
|
Thanks for that! I may try it if the gem doesn't work out... |
@jacobwgillespie Hi!! I've been having the same issue. Did the transaction_retry gem solve your problem? |
Yep, the gem's been working for me - I'm logging the warnings about transaction retries to the error monitor (Bugsnag) and then gem's taking care of them nicely. |
@jacobwgillespie Thank you so much for the quick response. :) |
👍 |
Hi,we have the same problem: It's not at the end of the migration (we have 1.9 B entries) and this happens on 87 Mio. But the table is very heavily used. Data is only added and not changed at all, but entries are deleted. We did test it on our test system with live data, so when no load is on it works. Any ideas what we can do? We soon reach the integer boundary for our ID and need to migrate to bigint ;-) |
If the transaction_retry gem doesn't work for you, you could always set up a parallel table and some MySQL triggers to propagate changes from your primary table to the new one while you're moving data over... For an even more creative solution, you could try switching from InnoDB to TokuDB and take advantage of the hot schema updates to add a new column for the new ID and then rename it once it's ready to function as the ID (while data is migrating with an INSERT query, you can adjust your rails model with a before_save callback to make sure the new_id column gets populated). |
Thank you. |
We ran into this issue when migrating a table with about 6,000,000 rows. The database globally gets about 25k calls per minute from the rails app. It's weighted toward reads, but the table in question has more frequent writes than any other in our database. Here is the deadlock as saved from
With the default stride of 40,000 and throttle of 0.1 seconds, we got this error (approximately--it's always on a different row) very quickly. With a stride of 5,000 and throttle of 0.8 seconds, it takes ~20 minutes on average before it occurs. It seems very random though, maybe based on user behavior. My understanding, maybe incorrect, is that the batch insert acquires a gap lock on the temporary table, and at the same time, a trigger tries updating a row in that batch. I am not an expert on this topic, but it seemed like this Percona post explains the problem and some potential solutions: http://www.percona.com/blog/2012/03/27/innodbs-gap-locks/. However, our database runs at the isolation level Worth noting that, when this deadlock occurs for us, it seems like it also holds a lock on the non-lhm table, which in turn crashes our app. I'm not sure why that happens, but it's scary and counter to lhm's core promise of safe non-blocking migrations. Thank you to those above who recommended the transaction_retry gem and the lhm retry monkey patch. We'll give those a shot. If that fails, we may need to change our replication setup and isolation level for this migration. I'm not sure lhm is doing anything wrong here, but it might be good to document the issue at least. EDIT: The |
I beleive this is fixed on master, as we now have a code to retry if deadlocks are found #93 . |
@arthurnn I don't think that PR actually addresses the deadlock issue. That only affects the switcher and the LHMs discussed in this issue fail well before the table switch. Could this be re-opened? |
See also: #107? |
Curious if anyone else has experienced a deadlock during an LHM migration where, for instance, on a busy table Transaction1 (replace into...) is waiting for Transaction2 (insert into...) with
lock mode AUTO-INC
and the transactions get rolled back?In the past we've seen this once, and the most obvious answer seems that it's just an auto-increment lock deadlock. However, for most cases you're copying over the
id
field from table-to-table, which is typically an auto-increment field.To fix this one could remove the auto-increment from that field before the LHM, and add it back to the new table's field right before the table switch. Another option would be to set innodb_autoinc_lock_mode = 2 in mysql temporarily until the table switch.
My question is, would this be a good configurable option for LHM, since it applies to a typical database setup?
The text was updated successfully, but these errors were encountered: