You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One important (and non-trivial) aspect of running model servers today is to ensure they are able to scale horizontally in response to load. Today, traditional CPU/Memory-based autoscaling are not sufficient for this type of workload (see this doc).
In order to help Kubernetes users bootstrap their use of these workloads and be able to operate them at scale, we would like to add HPA configurations in the serving catalog such that all the heavy lifting around providing metrics, adapters and configurations is all working out of the box.
Proposal:
Enrich current manifests in the serving catalog repository with HPA configurations.
This will be composed of several pieces:
Local Prometheus that is scraping metrics from the model servers directly. Or this could be a Cloud vendor-specific product that provides the same functionality (for example, GKE could use Google Managed Prometheus).
Prometheus Adapter making the workload metrics collected by the Prometheus implementation available to HPA.
HPA configuration using one of the model server's metrics (eg. queue length).
Leverage Kustomize's patching functionality to allow the user to choose what "flavour" of HPA they would like, to start we could provide two different types:
One important (and non-trivial) aspect of running model servers today is to ensure they are able to scale horizontally in response to load. Today, traditional CPU/Memory-based autoscaling are not sufficient for this type of workload (see this doc).
In order to help Kubernetes users bootstrap their use of these workloads and be able to operate them at scale, we would like to add HPA configurations in the serving catalog such that all the heavy lifting around providing metrics, adapters and configurations is all working out of the box.
Proposal:
References:
The text was updated successfully, but these errors were encountered: