-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ais-operator: cannot shutdown cluster #200
Comments
Here are some logs for reference
|
This is intended for now, helps us keep everything in one place for reference. Thanks for opening the issue! I'll try to replicate and report back. |
I am able to replicate the issue. The operator attempts to first shutdown the AIS cluster gracefully before scaling down the statefulset. It requires the cluster to no longer be responding to requests before it starts that scaling. But if k8s restarts the unresponsive pods they resume responding. AIS itself has no concept of a "shutdown" state so if this happens we get stuck with the operator waiting for the cluster to quit answering. Operator does not repeat the shutdown call (and if it did, we'd likely see the same thing) so it's just stuck waiting on something that will never happen. To resolve we need to
I'll test out these changes and hopefully we can get a fix out in a release later this week or next. |
Awesome! Thanks! |
Is there an existing issue for this?
Describe the bug
Hi, I originally wanted to report an issue in the ais-operator project, but the ais-operator project does not enable issue function.
When the AIStoreSpec.ShutdownCluster field is updated, the pod will exit but then be re-started again. So the AIStore is pending shutting down state.
Expected Behavior
Continue to perform subsequent scaling replicas to 0 and update to shutdown state
Current Behavior
the AIStore CR is pending shutting down state.
Steps To Reproduce
Possible Solution
just shutting the service, but does not exit the process to keep the shutting down state in ais daemons?
Additional Information/Context
No response
AIStore build/version
main
Environment details (OS name and version, etc.)
Linux
The text was updated successfully, but these errors were encountered: