-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serving Catalog] Add llama3.1-405b vLLM GKE support with LWS #11
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Edwinhr716 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
until ray status --address $LWS_LEADER_ADDRESS:6380; do | ||
sleep 5; | ||
done | ||
entrypoint.sh: |- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjk-g can we propose to vllm to create containers that embeds this logic? we can create an issue on vllm repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 vllm support for creating Ray cluster via their container is preferred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue has been created on the vllm repo vllm-project/vllm#8302
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the issue hasn't been addressed, would it be better to merge this PR right now, and modify it later once a multihost vllm image is created?
serving-catalog/core/lws/vllm/llama3-405b/gke/kustomization.yaml
Outdated
Show resolved
Hide resolved
key: hf_api_token | ||
- name: MODEL_ID | ||
valueFrom: | ||
configMapKeyRef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a configmap is inflating the lws yaml, assuming we can bake the logic in the container, do we still need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need it, although in practice I liked defining the environment variables shared across leaders/workers in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
One serving-catalog style nit, let's have file extensions for resources be .yaml
the first time the resource is introduced in the hierarchy (in a base), and .patch.yaml
in overlays that patch the existing base resources.
until ray status --address $LWS_LEADER_ADDRESS:6380; do | ||
sleep 5; | ||
done | ||
entrypoint.sh: |- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 vllm support for creating Ray cluster via their container is preferred
key: hf_api_token | ||
- name: MODEL_ID | ||
valueFrom: | ||
configMapKeyRef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need it, although in practice I liked defining the environment variables shared across leaders/workers in one place.
Switched names to .patch.yaml for files that patch existing resources |
@@ -0,0 +1,5 @@ | |||
# LeaderWorkerSet (lws) | |||
|
|||
In order to be able to run the workloads on this directory you will need to install the lws controller. Instructions on how to do so can be found here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be the only way or just use statefulsets as well so that we provide more options for folks who do not want to deploy LWS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding statefulsets as well is outside the scope of this PR. I'll edit the title of the PR to reflect that it only covers an example using lws
name: vllm-multihost-config | ||
key: model_id | ||
- name: PIPELINE_PARALLEL_SIZE | ||
value: $(LWS_GROUP_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the user use leaderworkerset.sigs.k8s.io/size
to inject it instead? Is there a recommended value for the specific deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand the question correctly. PIPELINE_PARALLEL_SIZE corresponds to the number of nodes that the model is deployed in, so it is the same value as leaderworkerset.sigs.k8s.io/size
.
The value of $(LWS_GROUP_SIZE)
contains the same value as leaderworkerset.sigs.k8s.io/size
No description provided.