Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Don't submit scale requests if the worker group is suspended #2666

Open
1 of 2 tasks
kevin85421 opened this issue Dec 18, 2024 · 1 comment · May be fixed by ray-project/ray#49768
Open
1 of 2 tasks

[Feature] Don't submit scale requests if the worker group is suspended #2666

kevin85421 opened this issue Dec 18, 2024 · 1 comment · May be fixed by ray-project/ray#49768
Assignees
Labels

Comments

@kevin85421
Copy link
Member

kevin85421 commented Dec 18, 2024

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

#2663 adds a new field suspend for each worker group. Ray Autoscaler should not submit any scale requests for worker groups which have already been suspended.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@rueian
Copy link
Contributor

rueian commented Jan 8, 2025

@kevin85421 Let me take this.

@kevin85421 kevin85421 assigned rueian and unassigned kevin85421 Jan 8, 2025
rueian added a commit to rueian/ray that referenced this issue Jan 10, 2025
Resolves ray-project/kuberay#2666.

ray-project/kuberay#2663 adds a new `suspend` field to the
KubeRay worker group spec for suspending worker groups. A suspended worker group should
be scaled to 0 and never be scaled up until the group is resumed.

Since there is no similar functionality in the `available_node_types` definition,
the best way to let the autoscaler know a worker group has been suspended is to
set its max_workers to 0 as well as its min_workers.

This PR makes the KubeRay autoscaling config producer produce a config with
both max_workers and min_workers set to 0 if the worker group has been suspended.
The autoscaler will periodically take the config and do its work.

Signed-off-by: Rueian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants