-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple Ray Resources: Construct ray k8spods from Resources #2943
base: master
Are you sure you want to change the base?
Conversation
2778db2
to
b79cea4
Compare
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
f2b811c
to
da7d6ae
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2943 +/- ##
===========================================
+ Coverage 79.32% 91.49% +12.16%
===========================================
Files 199 92 -107
Lines 20870 3997 -16873
Branches 2684 0 -2684
===========================================
- Hits 16555 3657 -12898
+ Misses 3566 340 -3226
+ Partials 749 0 -749 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, nothing major.
flytekit/core/resources.py
Outdated
resources_map = { | ||
"cpu": "cpu", | ||
"mem": "memory", | ||
"gpu": "nvidia.com/gpu", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you expose this as a parameter? Set its default value to "nvidia.com/gpu"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the gpu value of resource_map?
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for flip-flopping on this, I only realized that we were removing the pod spec template. We should instead build helper functions around that, but still produce pod specs that get passed to the ray idl objects.
requests: Optional[Resources], | ||
limits: Optional[Resources], | ||
) -> dict[str, Any]: | ||
def _construct_k8s_pods_resources(resources: Optional[Resources], k8s_gpu_resource_key: str = "nvidia.com/gpu"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using other gpus is going to be hard, even if we push this parameter to the outer function (i.e. construct_k8s_pod_spec_from_resources
).
): | ||
self._group_name = group_name | ||
self._replicas = replicas | ||
self._max_replicas = max(replicas, max_replicas) if max_replicas is not None else replicas | ||
self._min_replicas = min(replicas, min_replicas) if min_replicas is not None else replicas | ||
self._ray_start_params = ray_start_params | ||
self._k8s_pod = k8s_pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep this as part of the interface and build helper functions that construct valid pod specs instead (as mentioned in the original flyte PR). This is going to help in the other problem we're having with passing the gpu resource name around (in other words, gpu can be an argument of one of the helper function that builds pod specs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you are saying. So we want users to construct the pod specs themself like calling construct_k8s_pod_spec_from_resources()
or specifying pod templates in user code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the method name simple, maybe pod from resources
requests: typing.Optional[Resources] = None, | ||
limits: typing.Optional[Resources] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
@@ -14,14 +15,22 @@ def __init__( | |||
min_replicas: typing.Optional[int] = None, | |||
max_replicas: typing.Optional[int] = None, | |||
ray_start_params: typing.Optional[typing.Dict[str, str]] = None, | |||
k8s_pod: typing.Optional[K8sPod] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, we should keep it. If someone specifies both k8s_pod
and requests
, then we should merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked with Eduardo about this today and agreed to only expose k8s_pod
and let the user use helper functions to construct the k8s_pod like construct_k8s_pod_spec_from_resources()
in this PR
Tracking issue
Related to flyteorg/flyte#5666
Why are the changes needed?
These changes update the flytekit ray plugin to let the user specify Resources for Ray Head & Worker nodes instead of specifying the whole k8spod Object.
What changes were proposed in this pull request?
k8spod
objectrequests[Resources]
andlimits[Resources]
forWorkerNodeconfig
&HeadNodeConfig
construct_k8s_pod_spec_from_resources()
to construct k8spod pod definition from ResourcesHow was this patch tested?
Ray Head node Resources
Ray controller Resources
Ray worker Resources
Check all the applicable boxes
Related PRs
Docs link