Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run resource executions in parallel #105

Closed
wants to merge 1 commit into from

Conversation

pauldraper
Copy link

Solves #80

For me, it reduces an S3 diff from 140 to 20 seconds over wifi.

$puts_mutex.synchronize {
old_puts(*args)
}
end
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, Ruby starts printing partial lines

Instead of

Line 1
Line 2
Line 3

it prints

Line 1Line 2

Line 3

@dtorgy
Copy link
Contributor

dtorgy commented Dec 11, 2015

AWS actually limits us on the number of API calls that we can make in any given second. This limit applied to the entire account. Running these in parallel will likely cause us to hit our API limits which will result in random S3 calls failing.

@krjackso
Copy link
Contributor

So I went and did a smoke test of this and it does speed up s3 quite a bit. It also makes us hit our rate limit on elb and autoscaling, and it's eating Exceptions. After I made the following change in each_difference we saw rate limits right away

pool.post do
            begin
              if !aws_resources.include?(key)
                f.call(key, [added_diff(resource)])
              else
                f.call(key, diff_resource(resource, aws_resources[key]))
              end
            rescue => e
              puts "Exception: #{e}"
            end
          end
keilan@keilan:~/lucid/cumulus-paul$ time ./bin/cumulus.rb --root /var/lucid/ops/scripts/cumulus/ --config /var/lucid/ops/scripts/cumulus/configuration.json autoscaling diff
AutoScaling Group SupportToolsWebGroup has the following changes:
    Health check type: AWS - EC2, Local - ELB
    Health check grace period: AWS - 900, Local - 600
AutoScaling Group DocumentService has the following changes:
    Health check type: AWS - EC2, Local - ELB
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded
Exception: Rate exceeded

This could be workable if we are not eating exceptions (some exceptions really do need to stop execution) and we need to also not do the sync in parallel. (each_difference is used by both diffing and syncing) and we need to only do it on modules that need it (s3 is probably okay).

@msiebert
Copy link
Contributor

Also, do we need Paul's sublime text config in the repo?

@pauldraper
Copy link
Author

it's eating Exceptions

I'd expect nothing less from Ruby.

@pauldraper
Copy link
Author

Also, do we need Paul's sublime text config in the repo?

No...it's not in the .gitignore :( I'll add it.

@pauldraper
Copy link
Author

AWS actually limits us on the number of API calls that we can make in any given second.

This won't be hard to change. Where are the limits documented?

@pauldraper
Copy link
Author

@krjackso, I think I've fixed the issues.

  • We catch the first exception and shutdown the thread pool and raise it.
  • AWS doesn't document the limits for all of their APIs, except for the fact that they exist. They say that the best way to handle that is retries with exponential backoff, which the Ruby client already does. I made the # of retries configurable, with a suggested 5 instead of the default 3. Actually, I made the client config object able to be used with any of the config parameters for Ruby's AWS client.
  • I make the parallelism configurable, with a suggested default of 5 rather than 10.

@krjackso
Copy link
Contributor

The problem I'm seeing with the retries still is that we are still hitting the throttling limit, but just trying more times. Not sure that solution will work for something like ELB where we expect to not be throttled in other places, @dtorgy should be able to make that decision though. If we don't want parallelism for the modules we get rate limited on, we could have an opt-out value in config per-module pretty easily. Syncing in parallel shouldn't cause a problem that I can think of... We don't guarantee the order anyways when syncing so it should be no different.

@pauldraper
Copy link
Author

The problem I'm seeing with the retries still is that we are still hitting the throttling limit, but just trying more times.

Retries are actually the AWS-recommended solution to hitting their undocumented limits.

If an API request exceeds the API request rate for its category, the request returns the RequestLimitExceeded error code. To prevent this error, ensure that your application doesn't retry API requests at a high rate. You can do this by using care when polling and by using exponential back-off retries.

But we don't save much on ELBs anyway -- they take seconds not minutes to diff -- so I'm fine with setting the parallelism per API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants