Ensure EXISTS key is not orphaned when expire is used #39

jcalvert · 2016-01-05T19:00:45Z

TL;DR - GETSET resets the TTL on a key to indefinite. We need to ensure a key always has a TTL if using :expiration

An observed and rare production (1 in a million) phenomenon has been a deadlock condition when timeouts on expiring locks are not used. Upon inspection of Redis, we could see that the EXPIRES key was present, but without a TTL and the AVAILABLE key was not.

The problem occurs because in the fix for block syntax in b0bbfda removed setting the expiration explicitly when the EXISTS key already exists. IE token = @redis.getset(exists_key, EXISTS_TOKEN) returns a non-nil result.

This is problematic because the getset resets the TTL. You can see this demonstrated in Redis:

127.0.0.1:6379[15]> getset 'foobar' '0'
(nil)
127.0.0.1:6379[15]> getset 'foobar' '0'
"0"
127.0.0.1:6379[15]> expire 'foobar' 999
(integer) 1
127.0.0.1:6379[15]> ttl 'foobar'
(integer) 996
127.0.0.1:6379[15]> getset 'foobar' '0'
"0"
127.0.0.1:6379[15]> ttl 'foobar'
(integer) -1

This now opens up the possibility that one process is in the critical section, a second process resets the TTL on the EXISTS key and then waits via BLPOP for the first process to complete. If somehow the AVAILABLE key expires, the second process will be waiting either indefinitely or until the lock times out. If the lock times out and is then retried, the EXISTS key will still be there forever and not expire, so it will again wait without completing. If the expiration is set explicitly after GETSET , then the retry will be successful after the expiration period because the semaphore will create a new set of keys.

CC @dany1468 @dv

houndci-bot · 2016-01-05T19:00:48Z

spec/semaphore_spec.rb

+      @redis2 = Redis.new :db => 15
+      threads.each{|t| t.kill } #ensure signal step fails
+      expect(@redis2.ttl("SEMAPHORE:my_semaphore:EXISTS").to_s).to_not eql("-1")
+      sleep 4.0 #allow blpop timeout to occur


Missing space after #.

dv · 2016-01-24T14:57:31Z

Hey @jcalvert this looks good, thanks so much for contributing!

Could you check out the CI errors and see if you could get them fixed? If you can also fix the hound comments that'd be great!

I'll merge it then, thanks!

jcalvert · 2016-02-04T18:07:12Z

@dv I'll try to clean this up here soon!

houndci-bot · 2016-02-04T21:54:39Z

spec/semaphore_spec.rb

+
+    it "does not leave a key without expiration if expiration given" do
+      queue = Queue.new
+      threads = 2.times.map do


Use Array.new with a block instead of .times.map.

…ng orphaned. We share redis clients between threads (redis obj is threadsafe) and because of this the 2nd thread will be blocking, waiting for the AVAILABLE key and thus the first thread is unable to finish cleanup. Eventually the AVAILABLE key expires but the EXISTS key does not. The failure to succeed with the signal() call could be due to loss of network connection, process crash, etc. Leaving an unexpiring EXISTS key but no AVAILABLE key means that a retry will believe the semaphore exists and wait for the AVAILABLE which never comes.

…ist will obviate it anyway and since keys can expire during a MULTI transaction anyway. Return the removed statement where we always reset the key expiration prior to attempting to obtain a lock. This prevents the orphaned EXISTS key that never expires.

jcalvert · 2016-02-04T22:02:21Z

@dv I believe I have cleaned up the pull request. Thanks!

danielnc · 2016-06-21T18:23:07Z

👍 for this

jasonl · 2016-11-25T13:19:06Z

Has there been any further work on this? We're running into this issue on production as well - I'd be happy to take a look at the code if this has been abandoned.

jcalvert · 2016-11-25T16:52:16Z

@jasonl AFAIK there's no further work to be done. I did the HoundCI cleanup and the tests pass; whatever Travis failures are there seem to be unrelated. We used a fork with my patch in production at my previous employer.

houndci-bot reviewed Jan 5, 2016
View reviewed changes

dv added the needs work label Jan 24, 2016

jcalvert force-pushed the ensure_exist_key branch from 072236e to 23bd45e Compare February 4, 2016 21:54

houndci-bot reviewed Feb 4, 2016
View reviewed changes

jcalvert force-pushed the ensure_exist_key branch from 23bd45e to 30d028c Compare February 4, 2016 21:59

jcalvert force-pushed the ensure_exist_key branch from 30d028c to 6ac21c3 Compare February 4, 2016 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure EXISTS key is not orphaned when expire is used #39

Ensure EXISTS key is not orphaned when expire is used #39

jcalvert commented Jan 5, 2016

houndci-bot Jan 5, 2016

dv commented Jan 24, 2016

jcalvert commented Feb 4, 2016

houndci-bot Feb 4, 2016

jcalvert commented Feb 4, 2016

danielnc commented Jun 21, 2016

jasonl commented Nov 25, 2016

jcalvert commented Nov 25, 2016

Ensure EXISTS key is not orphaned when expire is used #39

Are you sure you want to change the base?

Ensure EXISTS key is not orphaned when expire is used #39

Conversation

jcalvert commented Jan 5, 2016

houndci-bot Jan 5, 2016

Choose a reason for hiding this comment

dv commented Jan 24, 2016

jcalvert commented Feb 4, 2016

houndci-bot Feb 4, 2016

Choose a reason for hiding this comment

jcalvert commented Feb 4, 2016

danielnc commented Jun 21, 2016

jasonl commented Nov 25, 2016

jcalvert commented Nov 25, 2016