Setting up Llamadeploy for multiagent deployment on k8s #357

hz6yc3 · 2024-11-11T22:42:15Z

There is no documentation that provides guidance on how to set up Llamadeploy (control plane, message queue and service deployment) on Kubernetes. The example provided in the code is little confusing and our company badly need some guidance on setting up Llamadeploy for enterprise deployment. Any relevant documentation or sample configuration that someone can share would be really helpful.

hz6yc3 · 2024-11-12T15:33:19Z

@masci thanks a lot for looking into my question above. We are kind of blocked and there is a some urgency in completing the PoC for agentic workflows using LlamaIndex and greatly appreciate if you can provide some guidance with the request above.

logan-markewich · 2024-11-12T15:46:39Z

@hz6yc3 while it might not be totally clear from docs/examples, its fairly straightforward. You'd need to use the lower-level API
https://docs.llamaindex.ai/en/stable/module_guides/llama_deploy/30_manual_orchestration/

Basically, you can setup a docker image that deploys the core

Then another docker image from there that deploys a workflow service (or several, depending on how you want to manage scaling)

Once you have it running in docker, its fairly transferrable to then launching those docker images in a k8s cluster

This example walks through all of this, including k8s
https://github.com/run-llama/llama_deploy/tree/main/examples/message-queue-integrations
https://github.com/run-llama/llama_deploy/tree/main/examples/message-queue-integrations/rabbitmq/kubernetes

We are working on updates to make this easier though, using a more simple top-level yaml file rather than writing code for all the deployments. But in-lieu of that, the above is the best approach.

hz6yc3 · 2024-11-12T18:30:21Z

@logan-markewich thanks a lot! Let me read through the documents. We were not sure on the guidance for centrally deploying the core components because based on the architecture in the documentation it seemed like we have to deploy the core components (control plane, message queue) for each deployment separately. The way we deploy applications in our company is that every application is deployed within its own namespace on the cluster so we weren't sure how we would want to set up the deployment pattern using llama deploy.

rehevkor5 · 2024-11-15T18:25:27Z

Yeah, https://www.llamaindex.ai/blog/introducing-llama-deploy-a-microservice-based-way-to-deploy-llamaindex-workflows is somewhat misleading about these:

llama-deploy launches llama-index Workflows as scalable microservices

everything’s an independently-scalable microservice

microservices architecture of llama-deploy enables easy scaling of individual components, ensuring your system can handle growing demands

If you use the API Server / llamactl, the control plane and all the services are run in-process either as an asyncio task or as a uvicorn HTTP server. So, inherently centralized and not independently scalable. If you want your services to be independently scalable, you have to implement your own solution for that.

masci · 2024-11-16T15:47:04Z

@rehevkor5 first of all, thanks for the feedback!

What you read in the article is still true but it dates back to before we introduced the apiserver, see how we changed the architecture diagram here so I see how this can be misleading. A quick recap to clarify the situation:

If you manually orchestrate the different components you see in the diagram, your system is consistent with what's in the article (every component is independent, talking to the others via HTTP hence scalable and close to an actual microservice)
If you're using the apiserver, the components are wrapped into a single thread so they can't scale independently.

Why the apiserver is monolithic then? The apiserver is a key component of what we want Llama Deploy to become in terms of user experience. We wanted to quickly validate the concept of "deployments" and their yaml definition with our users and get feedback as soon as possible, so we optimized the current "backend" of the apiserver for running in a single-process/single-container environment that was easy to setup.

But we're already planning an actual scalable implementation of the apiserver backend, currently we're leaning towards building on top of existing container orchestrators to move faster and avoid reinventing any wheel.

I'll expand the docs to include these considerations and call out that the apiserver is work in progress. Let me know if you have any question!

hz6yc3 · 2024-11-16T17:28:25Z

@masci sounds like your suggested approach is manual orchestration for deploying the individual components for now until a scalable solution using api server is developed. Based on the updated architecture diagram you shared it sounds like we have to create separate "deployments" with its own control plane and message queue config for deploying the associated workflows?

abdulhuq-cimulate · 2024-11-19T16:58:22Z

@masci I have been working on a POC to set up Llama Deploy workflows using the manual orchestration approach. I did manage to set it up using docker-compose using a custom docker image with both simple message queue and redis. As next step of the POC, I tried deploying the services to k8s. The setup I was going for is to have a centralized deployment of control plane and message queue (with multiple replicas), deploy workflows as a separat deployment (with multiple replicas) and register the workflow to the central control plane.
I believe, I am running into issues because each replica of the control plane might have its own isolated service metadata and each pod of the workflow deployment might register its own version of the service on the control plane.

Is there a way to share the service metadata information across the control plane deployments and to register one instance of the workflow?

In the meantime, I can scale down my replicas to 1 to mitigate the issue but curious to see if there is already a fix available.

Edit: A quick fix could be to allow passing in a KV store URI that is a separate service for the control plane to use here via env var like CONTROL_PLANE_SERVICE_KV_STORE_URI.

masci · 2024-11-20T09:45:14Z

Edit: A quick fix could be to allow passing in a KV store URI that is a separate service for the control plane to use here via env var like CONTROL_PLANE_SERVICE_KV_STORE_URI.

Yes I believe that would be the solution, we already have a bunch of stores that can run on separate services https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/storage/kvstore so for example you could use the Redis implementation.

I'll look into it, tracking the feature with #370

masci added this to Framework Nov 11, 2024

masci self-assigned this Nov 12, 2024

masci mentioned this issue Nov 20, 2024

Allow to configure KV store in the control plane via URI #370

Closed

masci mentioned this issue Dec 10, 2024

Request hot-reload func for the llama-deploy workflows #398

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting up Llamadeploy for multiagent deployment on k8s #357

Setting up Llamadeploy for multiagent deployment on k8s #357

hz6yc3 commented Nov 11, 2024

hz6yc3 commented Nov 12, 2024

logan-markewich commented Nov 12, 2024

hz6yc3 commented Nov 12, 2024

rehevkor5 commented Nov 15, 2024

masci commented Nov 16, 2024

hz6yc3 commented Nov 16, 2024

abdulhuq-cimulate commented Nov 19, 2024 •

edited

Loading

masci commented Nov 20, 2024

Setting up Llamadeploy for multiagent deployment on k8s #357

Setting up Llamadeploy for multiagent deployment on k8s #357

Comments

hz6yc3 commented Nov 11, 2024

hz6yc3 commented Nov 12, 2024

logan-markewich commented Nov 12, 2024

hz6yc3 commented Nov 12, 2024

rehevkor5 commented Nov 15, 2024

masci commented Nov 16, 2024

hz6yc3 commented Nov 16, 2024

abdulhuq-cimulate commented Nov 19, 2024 • edited Loading

masci commented Nov 20, 2024

abdulhuq-cimulate commented Nov 19, 2024 •

edited

Loading