New documentation site for Seldon Core v2 (SeldonIO#5760)

New format compatible with GItBook * moved docs out of the source directory and removed spnix-related files * APIs section completed * changing the configuration section in the getting started guide * getting started sectionc completed * rearranged models directory and enhanced different docs * added most images in the dos to the images directory * moved outliers and drift docs to its own file in the root directory * deleted servers directory and moved servers.md to the root directory with enhancements * deleted pipelines dir and moved pipelines.md to the root directory * deleted inference dir and moved inference.md to the root directory * deleted explainers dir and moved explainers.md to the root directory * deleted performance-tests dir and moved .md to the root directory * deleted experiments dir and moved .md to the root directory * updated about section to match gitbook's expected format * updated FAQs section to match gitbook's expected format * updated pandas query section with choice1.yaml * mostly moved and renamed files and directories * updated SUMMARY.md for GitBook * adding additional images * restructured development dir * restructured and reformatted examples dir to match GitBook's md flavor * added gitbook format to metrics dir * restructured k8s directory to match GitBooks expected md flavor * reformatted cli dir * typos and links fixed * typos and links fixed * tentative structured added to the root of the docs * fixed names in kubernetes section * GITBOOK-1: changed hard-coded reference to scheduler.proto * added reference to chainer.proto instead of hard-coded version * removed hard-coded references and added GitHub Gist pointing to v2 branch * fixed format and broken links feat(docs): adding a mention of per component labels and annotations to the docs (SeldonIO#5931) feat(docs): add documentation for HPA-based autoscaling (SeldonIO#5935) This describes a solution for scaling both Models and Servers based on HPA for the case of single-model serving. In the example described in the docs, the scaling is done based on Model RPS metrics fetched from Prometheus.
lc525 · Oct 2, 2024 · b10947a · b10947a
1 parent 753ec3a
commit b10947a
Show file tree

Hide file tree

Showing 384 changed files with 50,426 additions and 4,597 deletions.
diff --git a/docs-gb/README.md b/docs-gb/README.md
@@ -0,0 +1,63 @@
+# About
+
+Seldon V2 APIs provide a state of the art solution for machine learning inference which
+can be run locally on a laptop as well as on Kubernetes for production.
+
+{% embed url="https://www.youtube.com/watch?v=ar5lSG_idh4" %}
+
+## Features
+
+* A single platform for inference of wide range of standard and custom artifacts.
+* Deploy locally in Docker during development and testing of models.
+* Deploy at scale on Kubernetes for production.
+* Deploy single models to multi-step pipelines.
+* Save infrastructure costs by deploying multiple models transparently in inference servers.
+* Overcommit on resources to deploy more models than available memory.
+* Dynamically extended models with pipelines with a data-centric perspective backed by Kafka.
+* Explain individual models and pipelines with state of the art explanation techniques.
+* Deploy drift and outlier detectors alongside models.
+* Kubernetes Service mesh agnostic - use the service mesh of your choice.
+
+
+## Core features and comparison to Seldon Core V1 APIs
+
+Our V2 APIs separate out core tasks into separate resources allowing users to get started fast
+with deploying a Model and the progressing to more complex Pipelines, Explanations and Experiments.
+
+![intro](images/intro.png)
+
+## Multi-model serving
+
+Seldon transparently will provision your model onto the correct inference server.
+
+![mms1](images/multimodel1.png)
+
+By packing multiple models onto a smaller set of servers users can save infrastructure costs and
+efficiently utilize their models.
+
+![mms2](images/mms.png)
+
+By allowing over-commit users can provision model models that available memory resources by
+allowing Seldon to transparently unload models that are not in use.
+
+![mms3](images/overcommit.png)
+
+## Inference Servers
+
+Seldon V2 supports any V2 protocol inference server. At present we include Seldon's MLServer and NVIDIA's Triton inference server automatically on install. These servers cover a wide range of artifacts including custom python models.
+
+![servers](images/servers.png)
+
+## Service Mesh Agnostic
+
+Seldon core v2 can be integrated with any Kubernetes service mesh. There are current examples with istio, Ambassador and Traefic.
+
+![mesh](images/mesh.png)
+
+## Publication
+
+These features are influenced by our position paper on the next generation of ML model serving frameworks:
+
+*Title*: [Desiderata for next generation of ML model serving](http://arxiv.org/abs/2210.14665)
+
+*Workshop*: Challenges in deploying and monitoring ML systems workshop - NeurIPS 2022
diff --git a/docs-gb/SUMMARY.md b/docs-gb/SUMMARY.md
@@ -0,0 +1,127 @@
+# Table of contents
+
+* [Home](README.md)
+* [Getting Started](getting-started/README.md)
+  * [Docker Installation](getting-started/docker-installation.md)
+  * [Kubernetes Installation](getting-started/kubernetes-installation/README.md)
+    * [Ansible](getting-started/kubernetes-installation/ansible.md)
+    * [Helm](getting-started/kubernetes-installation/helm.md)
+    * [Security](getting-started/kubernetes-installation/security/README.md)
+      * [AWS MSK mTLS](getting-started/kubernetes-installation/security/aws-msk-mtls.md)
+      * [AWS MSK SASL](getting-started/kubernetes-installation/security/aws-msk-sasl.md)
+      * [Azure Event Hub SASL Example](getting-started/kubernetes-installation/security/azure-event-hub-sasl.md)
+      * [Confluent Cloud Oauth 2.0 Example](getting-started/kubernetes-installation/security/confluent-oauth.md)
+      * [Confluent Cloud SASL Example](getting-started/kubernetes-installation/security/confluent-sasl.md)
+      * [Strimzi mTLS Example](getting-started/kubernetes-installation/security/strimzi-mtls.md)
+      * [Strimzi SASL Example](getting-started/kubernetes-installation/security/strimzi-sasl.md)
+      * [Reference](getting-started/kubernetes-installation/security/reference.md)
+  * [Configuration](getting-started/configuration.md)
+  * [Seldon CLI](getting-started/cli.md)
+* [APIs](apis/README.md)
+  * [Internal](apis/internal/README.md)
+    * [Chainer](apis/internal/chainer.md)
+    * [Agent](apis/internal/agent.md)
+  * [Inference](apis/inference/README.md)
+    * [Open Inference Protocol](apis/inference/v2.md)
+  * [Scheduler](apis/scheduler.md)
+* [Architecture](architecture/README.md)
+  * [DataFlow](architecture/dataflow.md)
+* [Examples](examples/README.md)
+  * [Local examples](examples/local-examples.md)
+  * [Kubernetes examples](examples/k8s-examples.md)
+  * [Huggingface models](examples/huggingface.md)
+  * [Model zoo](examples/model-zoo.md)
+  * [Artifact versions](examples/multi-version.md)
+  * [Pipeline examples](examples/pipeline-examples.md)
+  * [Pipeline to pipeline examples](examples/pipeline-to-pipeline.md)
+  * [Explainer examples](examples/explainer-examples.md)
+  * [Custom Servers](examples/custom-servers.md)
+  * [Local experiments](examples/local-experiments.md)
+  * [Experiment version examples](examples/experiment-versions.md)
+  * [Inference examples](examples/inference.md)
+  * [Tritonclient examples](examples/tritonclient-examples.md)
+  * [Batch Inference examples (kubernetes)](examples/batch-examples-k8s.md)
+  * [Batch Inference examples (local)](examples/batch-examples-local.md)
+  * [Checking Pipeline readiness](examples/pipeline-ready-and-metadata.md)
+  * [Multi-Namespace Kubernetes](examples/k8s-clusterwide.md)
+  * [Huggingface speech to sentiment with explanations pipeline](examples/speech-to-sentiment.md)
+  * [Production image classifier with drift and outlier monitoring](examples/cifar10.md)
+  * [Production income classifier with drift, outlier and explanations](examples/income.md)
+  * [Conditional pipeline with pandas query model](examples/pandasquery.md)
+  * [Kubernetes Server with PVC](examples/k8s-pvc.md)
+  * [Local Overcommit](examples/k8s-pvc.md)
+* [Kubernetes](kubernetes/README.md)
+  * [Scaling](kubernetes/scaling.md)
+  * [Autoscaling](kubernetes/autoscaling.md)
+  * [HPA Autoscaling in single-model serving](kubernetes/hpa-rps-autoscaling.md)
+  * [Tracing](kubernetes/tracing.md)
+  * [Storage Secrets](kubernetes/storage-secrets.md)
+  * [Kafka](kubernetes/kafka.md)
+  * [Metrics](kubernetes/metrics.md)
+  * [Resources](kubernetes/resources/README.md)
+    * [Model](kubernetes/resources/model.md)
+    * [Experiment](kubernetes/resources/experiment.md)
+    * [Pipeline](kubernetes/resources/pipeline.md)
+    * [Server](kubernetes/resources/server.md)
+    * [Server Config](kubernetes/resources/serverconfig.md)
+    * [Server Runtime](kubernetes/resources/seldonruntime.md)
+    * [Seldon Config](kubernetes/resources/seldonconfig.md)
+  * [Service Meshes](kubernetes/service-meshes/README.md)
+    * [Ambassador](kubernetes/service-meshes/ambassador.md)
+    * [Istio](kubernetes/service-meshes/istio.md)
+    * [Traefik](kubernetes/service-meshes/traefik.md)
+* [Resource allocation](resource-allocation/README.md)
+  * [Example: Serving models on dedicated GPU nodes](resource-allocation/example-serving-models-on-dedicated-gpu-nodes.md)
+* [Models](models/README.md)
+  * [Multi-Model Serving](models/mms.md)
+  * [Inference Artifacts](models/inference-artifacts.md)
+  * [rClone](models/rclone.md)
+  * [Parameterized Models](models/parameterized-models/README.md)
+  * [Pandas Query](models/parameterized-models/pandasquery.md)
+* [Metrics](metrics/README.md)
+  * [Usage](metrics/usage.md)
+  * [Operational](metrics/operational.md)
+  * [Local Metrics](metrics/local-metrics-test.md)
+* [Development](development/README.md)
+  * [License](development/licenses.md)
+  * [Release](development/release.md)
+* [CLI](cli/README.md)
+  * [Seldon](cli/seldon.md)
+  * [Config](cli/seldon\_config.md)
+    * [Config Activate](cli/seldon\_config\_activate.md)
+    * [Config Deactivate](cli/seldon\_config\_deactivate.md)
+    * [Config Add](cli/seldon\_config\_add.md)
+    * [Config List](cli/seldon\_config\_list.md)
+    * [Config Remove](cli/seldon\_config\_remove.md)
+  * [Experiment](cli/seldon\_experiment.md)
+    * [Experiment Start](cli/seldon\_experiment\_start.md)
+    * [Experiment Status](cli/seldon\_experiment\_status.md)
+    * [Experiment List](cli/seldon\_experiment\_list.md)
+    * [Experiment Stop](cli/seldon\_experiment\_stop.md)
+  * [Model](cli/seldon\_model.md)
+    * [Model Status](cli/seldon\_model\_status.md)
+    * [Model Load](cli/seldon\_model\_load.md)
+    * [Model List](cli/seldon\_model\_list.md)
+    * [Model Infer](cli/seldon\_model\_infer.md)
+    * [Model Metadata](cli/seldon\_model\_metadata.md)
+    * [Model Unload](cli/seldon\_model\_unload.md)
+  * [Pipeline](cli/seldon\_pipeline.md)
+    * [Pipeline Load](cli/seldon\_pipeline\_load.md)
+    * [Pipeline Status](cli/seldon\_pipeline\_status.md)
+    * [Pipeline List](cli/seldon\_pipeline\_list.md)
+    * [Pipeline Inspect](cli/seldon\_pipeline\_inspect.md)
+    * [Pipeline Infer](cli/seldon\_pipeline\_infer.md)
+    * [Pipeline Unload](cli/seldon\_pipeline\_unload.md)
+  * [Server](cli/seldon\_server.md)
+    * [Server List](cli/seldon\_server\_list.md)
+    * [Server Status](cli/seldon\_server\_status.md)
+* [Pipelines](pipelines.md)
+* [Experiments](experiments.md)
+* [Servers](servers.md)
+* [Inference](inference.md)
+* [Outlier Detection](outlier.md)
+* [Drift Detection](drift.md)
+* [Explainers](explainers.md)
+* [Performance Tests](performance-tests.md)
+* [Upgrading](upgrading.md)
+* [FAQ](faqs.md)
diff --git a/docs-gb/apis/README.md b/docs-gb/apis/README.md
@@ -0,0 +1,7 @@
+# APIs
+
+Seldon provides APIs for management and inference.
+
+* [API for inference](./inference/README.md)
+* [Scheduler API for management](./scheduler/README.md) (Advanced)
+* [Internal APIs](./internal/README.md) (Reference)
diff --git a/docs/source/contents/apis/inference/index.md → docs-gb/apis/inference/README.md b/docs/source/contents/apis/inference/index.md → docs-gb/apis/inference/README.md
@@ -2,17 +2,6 @@
 
 Seldon inference servers must respect the following API specification.
 
- * [Seldon, KServe, NVIDIA V2 Inference API Spec](./v2.md)
+* [Seldon, KServe, NVIDIA V2 Inference API Spec](./v2.md)
 
 In future, Seldon may provide extensions for use with Pipelines, Experiments and Explainers.
-
-```{toctree}
-:maxdepth: 1
-:hidden:
-
-v2.md
-```
-
-
-
-