Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of helm chart for vllm deployment on k8s #9199

Merged
merged 90 commits into from
Dec 10, 2024

Conversation

mfournioux
Copy link
Contributor

@mfournioux mfournioux commented Oct 9, 2024

This PR adds an example of helm chart to deploy vllm on Kubernetes cluster. The goal of this PR is to have a deployment example in order to have the best configuration for k8s.

This example implements an autonomous deployment for vllm with k8s probes (startup, readiness, liveness) which will wait for model to be fully loaded and then marks the pod with running status when the health checkpoint return 200.

As shown in the figure in readme file, the deployment follows two steps :

  • Step 1 : Load the model from an S3 to a volume. An init container is launched and waits for a job to load the model from an S3 to a volume

  • Step 2 : Launch VLLM engine. Once the model is loaded on the volume, Vllm is launched.

This deployment will launch two containers :

  1. The init containers which will be marked with completed status once the download job is done.

  2. A containers hosting vllm engine which will be marked with pending status when init container is ongoing, and will be marked with running status once init container is completed.

FIX #6073

Copy link

github-actions bot commented Oct 9, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@russellb
Copy link
Collaborator

russellb commented Oct 9, 2024

Thanks for the PR. Do you have any thoughts on how / where this could be tested to ensure it remains functional?

@mfournioux mfournioux changed the title Add example of chart helm for vllm deployment on k8s Add example of helm chart for vllm deployment on k8s Oct 10, 2024
@mfournioux
Copy link
Contributor Author

mfournioux commented Oct 10, 2024

Thanks for the PR. Do you have any thoughts on how / where this could be tested to ensure it remains functional?

I propose the following tests to be launched in a github workflow on every pull request :

  • Lint helm chart
  • Validate Kubernetes Manifests with Kubeconform
  • Create kind cluster in order to have a local Kubernetes cluster using Docker for test environment
  • Test helm install
  • Test the service with a post request and ensure we have a 200 response.

@russellb
Copy link
Collaborator

Thanks for the PR. Do you have any thoughts on how / where this could be tested to ensure it remains functional?

I propose the following tests to be launched in a github workflow on every pull request :

  • Lint helm chart
  • Validate Kubernetes Manifests with Kubeconform
  • Create kind cluster in order to have a local Kubernetes cluster using Docker for test environment
  • Test helm install
  • Test the service with a post request and ensure we have a 200 response.

That sounds good to me. Is that something you'd be willing to work on? Since the PR content is pretty standalone, it shouldn't be at much risk of going into conflict in the meantime.

@mfournioux
Copy link
Contributor Author

Thanks for the PR. Do you have any thoughts on how / where this could be tested to ensure it remains functional?

I propose the following tests to be launched in a github workflow on every pull request :

  • Lint helm chart
  • Validate Kubernetes Manifests with Kubeconform
  • Create kind cluster in order to have a local Kubernetes cluster using Docker for test environment
  • Test helm install
  • Test the service with a post request and ensure we have a 200 response.

That sounds good to me. Is that something you'd be willing to work on? Since the PR content is pretty standalone, it shouldn't be at much risk of going into conflict in the meantime.

Sure, I can work on implementing these tests.

@mfournioux
Copy link
Contributor Author

mfournioux commented Nov 20, 2024

@russellb I just added some github workflows to implement functional tests on the chart helm :

  • lint test of the chart
  • Input Validation with values.schema.json
  • setup a local minio to mock an s3 and download the model opt-125m for test
  • setup a local k8s cluster with kind
  • deploy the chart helm on the kind cluster (cpu only, I have used the cpu vllm docker image)
  • curl test to check if the service is correctly responding

All these tests have been implemented in a github worflow.

@simon-mo @khluu how can I have the rights to add these tests I have implemented in the vllm github worflows? Do you prefer I migrate them into buildkite?

Copy link

mergify bot commented Nov 20, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mfournioux.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 20, 2024
@mfournioux mfournioux closed this Nov 20, 2024
@mfournioux mfournioux force-pushed the add_chart_helm_example branch from 8d421da to 63f1fde Compare November 20, 2024 11:17
Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
@mfournioux mfournioux requested a review from russellb December 4, 2024 19:01
Signed-off-by: Maxime Fournioux <[email protected]>
@russellb
Copy link
Collaborator

russellb commented Dec 4, 2024

Thanks for addressing all of my feedback! I've pinged some maintainers to take a look.

@mfournioux
Copy link
Contributor Author

Thanks for addressing all of my feedback! I've pinged some maintainers to take a look.

Many thanks!

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spending the time and effort on setting this up!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 10, 2024 04:18
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 10, 2024
@DarkLight1337 DarkLight1337 merged commit fe2e10c into vllm-project:main Dec 10, 2024
33 of 34 checks passed
@mfournioux
Copy link
Contributor Author

Thanks for spending the time and effort on setting this up!

You are welcome!

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Add readiness endpoint /ready and return /health earlier (vLLM on Kubernetes)
4 participants