Announcing same jobs every 10 seconds #3

jaraco · 2015-06-30T18:00:37Z

Originally reported by: Darwin Monroy (Bitbucket: dmonroy, GitHub: dmonroy)

When pipeline workers are down and there are jobs ready to be taken mettle start announcing those jobs in a loop, so the rabbitmq job queue starts to grow.

Here is the chart of messages after 1 hour of pipeline workers being down (dev env)

And the mettle logs shows this every 10 seconds:

#!logs

11:39:29 timer.1         | INFO:mettle.timer:Sleeping for 10 seconds
11:39:22 timer.1         | INFO:mettle.timer:Checking pipelines.
11:39:23 timer.1         | INFO:mettle.timer:Checking jobs.
...
11:39:23 timer.1         | INFO:mettle_protocol.messages:Announcing job .............
...
11:39:29 timer.1         | INFO:mettle.timer:Cleaning up old logs.
11:39:29 timer.1         | INFO:mettle.timer:Finished scheduled tasks.  Took 7.364271 seconds
11:39:29 timer.1         | INFO:mettle.timer:Sleeping for 10 seconds

26000 messages isn't a big number, but for a development environment with just a pipeline (50 jobs) is huge. Few weeks ago the production's mq server had become slow because of millions of messages in the queues, some mettle process lost the connectivity to the mq server and then died.

Bitbucket: https://bitbucket.org/yougov/mettle/issue/3

The text was updated successfully, but these errors were encountered:

jaraco · 2015-06-30T18:01:28Z

Original comment by Darwin Monroy (Bitbucket: dmonroy, GitHub: dmonroy):

Here's the queue chart:

jaraco · 2015-06-30T18:04:55Z

Original comment by Darwin Monroy (Bitbucket: dmonroy, GitHub: dmonroy):

If there are no pipeline workers available, it must not announce a job.

jaraco · 2015-06-30T20:03:34Z

Original comment by Darwin Monroy (Bitbucket: dmonroy, GitHub: dmonroy):

2 hours later

jaraco · 2015-07-23T21:22:18Z

Original comment by Brent Tubbs (Bitbucket: btubbs, GitHub: btubbs):

Rather than having the timer and dispatcher try to determine whether there's a worker available to receive a message, I think we should put a TTL on the messages that the timer sends. If the timer is going to re-announce an unclaimed job every 60 seconds, then we can give those announcement messages a 60 second TTL so they're automatically dropped when they're no longer needed.

jaraco · 2015-07-31T00:21:28Z

Original comment by Darwin Monroy (Bitbucket: dmonroy, GitHub: dmonroy):

That's a pretty good idea, I'll work on that.

jaraco added minor bug labels Aug 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Announcing same jobs every 10 seconds #3

Announcing same jobs every 10 seconds #3

jaraco commented Jun 30, 2015

jaraco commented Jun 30, 2015

jaraco commented Jun 30, 2015

jaraco commented Jun 30, 2015

jaraco commented Jul 23, 2015

jaraco commented Jul 31, 2015

Announcing same jobs every 10 seconds #3

Announcing same jobs every 10 seconds #3

Comments

jaraco commented Jun 30, 2015

jaraco commented Jun 30, 2015

jaraco commented Jun 30, 2015

jaraco commented Jun 30, 2015

jaraco commented Jul 23, 2015

jaraco commented Jul 31, 2015