ec2 ocf resource retry #33

nasjomach · 2021-09-30T09:10:45Z

Concerns: cluster-glue/lib/plugins/stonith/external/ec2

Seems to me that there are no retry mechanism in the EC2 OCF script.
AWS EC2 API calls can be throttle if more than 10000 API request a seconds are made.
In this case the script would not report any status and consider the resource in a bad status ending up with the STONITH device getting stopped.

Performing a "resource cleanup" operation starts the STONITH again in operational state after such failures.

/var/log/messages
2021-09-16T16:02:04.751248+00:00 external/ec2(res_AWS_STONITH)[31700]: info: status check for is
<-- Missing instance status report after "is" keyword

2021-09-16T16:02:04.760725+00:00 external/ec2(res_AWS_STONITH)[31694]: WARN: Already fenced (Instance status = ). Aborting fence attempt.
2021-09-16T16:02:13.742017+00:00 external/ec2(res_AWS_STONITH)[32004]: ERROR: Operation status failed: 1

Maybe some kind of fault tolerance would be nice to have I guess.

dmuhamedagic · 2021-10-06T14:30:32Z

IIRC, none of the stonith plugins does that, i.e. runs in a loop until the status is correct, so this would be a precedence. A question: how often do you check the status? If it's too often and the device (in this case aws) is flaky, then you may try increasing the interval.

Thr3d · 2022-03-28T18:24:42Z

#35 Addresses this.
The API bucket the agent uses is shared for the account's whole region and fairly small so simply extending the interval doesn't help much after a point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ec2 ocf resource retry #33

ec2 ocf resource retry #33

nasjomach commented Sep 30, 2021 •

edited

Loading

dmuhamedagic commented Oct 6, 2021 •

edited

Loading

Thr3d commented Mar 28, 2022

ec2 ocf resource retry #33

ec2 ocf resource retry #33

Comments

nasjomach commented Sep 30, 2021 • edited Loading

dmuhamedagic commented Oct 6, 2021 • edited Loading

Thr3d commented Mar 28, 2022

nasjomach commented Sep 30, 2021 •

edited

Loading

dmuhamedagic commented Oct 6, 2021 •

edited

Loading