Run resource executions in parallel #105

pauldraper · 2015-12-11T17:48:11Z

Solves #80

For me, it reduces an S3 diff from 140 to 20 seconds over wifi.

pauldraper · 2015-12-11T17:50:12Z

lib/patches/kernel.rb

+    $puts_mutex.synchronize {
+        old_puts(*args)
+    }
+end


Otherwise, Ruby starts printing partial lines

Instead of

Line 1 Line 2 Line 3

it prints

Line 1Line 2 Line 3

dtorgy · 2015-12-11T17:54:53Z

AWS actually limits us on the number of API calls that we can make in any given second. This limit applied to the entire account. Running these in parallel will likely cause us to hit our API limits which will result in random S3 calls failing.

krjackso · 2015-12-11T20:56:06Z

So I went and did a smoke test of this and it does speed up s3 quite a bit. It also makes us hit our rate limit on elb and autoscaling, and it's eating Exceptions. After I made the following change in each_difference we saw rate limits right away

pool.post do
            begin
              if !aws_resources.include?(key)
                f.call(key, [added_diff(resource)])
              else
                f.call(key, diff_resource(resource, aws_resources[key]))
              end
            rescue => e
              puts "Exception: #{e}"
            end
          end

keilan@keilan:~/lucid/cumulus-paul$ time ./bin/cumulus.rb --root /var/lucid/ops/scripts/cumulus/ --config /var/lucid/ops/scripts/cumulus/configuration.json autoscaling diff
AutoScaling Group SupportToolsWebGroup has the following changes:
    Health check type: AWS - EC2, Local - ELB
    Health check grace period: AWS - 900, Local - 600
AutoScaling Group DocumentService has the following changes:
    Health check type: AWS - EC2, Local - ELB
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded

This could be workable if we are not eating exceptions (some exceptions really do need to stop execution) and we need to also not do the sync in parallel. (each_difference is used by both diffing and syncing) and we need to only do it on modules that need it (s3 is probably okay).

msiebert · 2015-12-11T21:00:21Z

Also, do we need Paul's sublime text config in the repo?

pauldraper · 2015-12-11T21:25:13Z

it's eating Exceptions

I'd expect nothing less from Ruby.

pauldraper · 2015-12-11T21:26:07Z

Also, do we need Paul's sublime text config in the repo?

No...it's not in the .gitignore :( I'll add it.

pauldraper · 2015-12-11T21:30:08Z

AWS actually limits us on the number of API calls that we can make in any given second.

This won't be hard to change. Where are the limits documented?

pauldraper · 2015-12-14T20:12:03Z

@krjackso, I think I've fixed the issues.

We catch the first exception and shutdown the thread pool and raise it.
AWS doesn't document the limits for all of their APIs, except for the fact that they exist. They say that the best way to handle that is retries with exponential backoff, which the Ruby client already does. I made the # of retries configurable, with a suggested 5 instead of the default 3. Actually, I made the client config object able to be used with any of the config parameters for Ruby's AWS client.
I make the parallelism configurable, with a suggested default of 5 rather than 10.

krjackso · 2015-12-16T01:22:46Z

The problem I'm seeing with the retries still is that we are still hitting the throttling limit, but just trying more times. Not sure that solution will work for something like ELB where we expect to not be throttled in other places, @dtorgy should be able to make that decision though. If we don't want parallelism for the modules we get rate limited on, we could have an opt-out value in config per-module pretty easily. Syncing in parallel shouldn't cause a problem that I can think of... We don't guarantee the order anyways when syncing so it should be no different.

pauldraper · 2015-12-16T06:18:06Z

The problem I'm seeing with the retries still is that we are still hitting the throttling limit, but just trying more times.

Retries are actually the AWS-recommended solution to hitting their undocumented limits.

If an API request exceeds the API request rate for its category, the request returns the RequestLimitExceeded error code. To prevent this error, ensure that your application doesn't retry API requests at a high rate. You can do this by using care when polling and by using exponential back-off retries.

But we don't save much on ELBs anyway -- they take seconds not minutes to diff -- so I'm fine with setting the parallelism per API.

pauldraper reviewed Dec 11, 2015
View reviewed changes

pauldraper force-pushed the paul-multithread branch from e3b8c7e to 2aac2b5 Compare December 14, 2015 20:06

Run resource executions in parallel

8755dd4

pauldraper force-pushed the paul-multithread branch from 2aac2b5 to 8755dd4 Compare December 14, 2015 20:33

msiebert added the code review label May 23, 2016

pauldraper closed this Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run resource executions in parallel #105

Run resource executions in parallel #105

pauldraper commented Dec 11, 2015

pauldraper Dec 11, 2015

dtorgy commented Dec 11, 2015

krjackso commented Dec 11, 2015

msiebert commented Dec 11, 2015

pauldraper commented Dec 11, 2015

pauldraper commented Dec 11, 2015

pauldraper commented Dec 11, 2015

pauldraper commented Dec 14, 2015

krjackso commented Dec 16, 2015

pauldraper commented Dec 16, 2015

Run resource executions in parallel #105

Run resource executions in parallel #105

Conversation

pauldraper commented Dec 11, 2015

pauldraper Dec 11, 2015

Choose a reason for hiding this comment

dtorgy commented Dec 11, 2015

krjackso commented Dec 11, 2015

msiebert commented Dec 11, 2015

pauldraper commented Dec 11, 2015

pauldraper commented Dec 11, 2015

pauldraper commented Dec 11, 2015

pauldraper commented Dec 14, 2015

krjackso commented Dec 16, 2015

pauldraper commented Dec 16, 2015