Skip to content

Commit

Permalink
[Update] added a note about deprecation of seldon and tensorflow pro…
Browse files Browse the repository at this point in the history
…tocol (#6182)

* added a note about deprecation of seldon and tensorflow protocol

* updated the note.

* Update the note

* Update optimization.md

* Update upgrading.md

* Update iris.ipynb

* Update backwards_compatability.ipynb

* Update protocol_examples.ipynb

* Update server_examples.ipynb

* rephrased the wordings and made the OIP the default tab

* removed the tabs

* added to include mlserver

* updated the link to mlserver docs

* incorporate Mathew's suggestion

* OIP first and then V2

---------

Co-authored-by: Rakavitha Kodhandapani <[email protected]>
  • Loading branch information
Rajakavitha1 and Rakavitha Kodhandapani authored Jan 17, 2025
1 parent 5509dc0 commit 801ddf4
Show file tree
Hide file tree
Showing 8 changed files with 78 additions and 44 deletions.
7 changes: 6 additions & 1 deletion doc/source/analytics/explainers.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@ For an e2e example, please check AnchorTabular notebook [here](../examples/iris_

## Explain API

**Note**: Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).

We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.


For the Seldon Protocol an endpoint path will be exposed for:

```
Expand Down Expand Up @@ -86,9 +91,9 @@ The explain method is also supported for tensorflow and Open Inference protocols

| Protocol | URI |
| ------ | ----- |
| v2 | `http://<host>/<ingress-path>/v2/models/<model-name>/infer` |
| seldon | `http://<host>/<ingress-path>/api/v1.0/explain` |
| tensorflow | `http://<host>/<ingress-path>/v1/models/<model-name>:explain` |
| v2 | `http://<host>/<ingress-path>/v2/models/<model-name>/infer` |


Note: for `tensorflow` protocol we support similar non-standard extension as for the [prediction API](../graph/protocols.md#rest-and-grpc-tensorflow-protocol), `http://<host>/<ingress-path>/v1/models/:explain`.
77 changes: 42 additions & 35 deletions doc/source/graph/protocols.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,20 @@
# Protocols

Tensorflow protocol is only available in version >=1.1.

Seldon Core supports the following data planes:

* [REST and gRPC Seldon protocol](#rest-and-grpc-seldon-protocol)
* [REST and gRPC Tensorflow Serving Protocol](#rest-and-grpc-tensorflow-protocol)
* [REST and gRPC Open Inference Protocol](#v2-protocol)

## REST and gRPC Seldon Protocol

* [REST Seldon Protocol](../reference/apis/index.html)

Seldon is the default protocol for SeldonDeployment resources. You can specify the gRPC protocol by setting `transport: grpc` in your SeldonDeployment resource or ensuring all components in the graph have endpoint.tranport set ot grpc.

See [example notebook](../examples/protocol_examples.html).

## REST and gRPC Tensorflow Protocol

* [REST Tensorflow Protocol definition](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/api_rest.md).
* [gRPC Tensorflow Protocol definition](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_service.proto).

Activate this protocol by speicfying `protocol: tensorflow` and `transport: rest` or `transport: grpc` in your Seldon Deployment. See [example notebook](../examples/protocol_examples.html).
**Note**: Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).

For Seldon graphs the protocol will work as expected for single model graphs for Tensorflow Serving servers running as the single model in the graph. For more complex graphs you can chain models:
We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.

* Sending the response from the first as a request to the second. This will be done automatically when you defined a chain of models as a Seldon graph. It is up to the user to ensure the response of each changed model can be fed a request to the next in the chain.
* Only Predict calls can be handled in multiple model chaining.
Tensorflow protocol is only available in version >=1.1.

Seldon Core supports the following data planes:

General considerations:
* REST and gRPC Open Inference Protocol
* REST and gRPC Seldon protocol
* REST and gRPC Tensorflow Serving Protocol

* Seldon components marked as MODELS, INPUT_TRANSFORMER and OUTPUT_TRANSFORMERS will allow a PredictionService Predict method to be called.
* GetModelStatus for any model in the graph is available.
* GetModelMetadata for any model in the graph is available.
* Combining and Routing with the Tensorflow protocol is not presently supported.
* `status` and `metadata` calls can be asked for any model in the graph
* a non-standard Seldon extension is available to call predict on the graph as a whole: `/v1/models/:predict`.
* The name of the model in the `graph` section of the SeldonDeployment spec must match the name of the model loaded onto the Tensorflow Server.


## Open Inference Protocol (or V2 protocol)
### REST and gRPC Open Inference Protocol

Seldon has collaborated with the [NVIDIA Triton Server
Project](https://github.com/triton-inference-server/server) and the [KServe
Expand Down Expand Up @@ -75,7 +49,7 @@ spec:
name: default
```
At present, the `v2` protocol is only supported in a subset of
At present, the `OIP` or `v2` protocol is only supported in a subset of
pre-packaged inference servers.
In particular,

Expand All @@ -87,3 +61,36 @@ In particular,
| [MLFLOW_SERVER](../servers/mlflow.md) | ✅ | [Seldon MLServer](https://github.com/seldonio/mlserver) |

You can try out the `v2` in [this example notebook](../examples/protocol_examples.html).


### REST and gRPC Seldon Protocol
* [REST Seldon Protocol](../reference/apis/index.html)

Seldon is the default protocol for SeldonDeployment resources. You can specify the gRPC protocol by setting `transport: grpc` in your SeldonDeployment resource or ensuring all components in the graph have endpoint.tranport set ot grpc.

See [example notebook](../examples/protocol_examples.html).

### REST and gRPC Tensorflow Protocol
* [REST Tensorflow Protocol definition](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/api_rest.md).
* [gRPC Tensorflow Protocol definition](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_service.proto).

Activate this protocol by specifying `protocol: tensorflow` and `transport: rest` or `transport: grpc` in your Seldon Deployment. See [example notebook](../examples/protocol_examples.html).

For Seldon graphs the protocol will work as expected for single model graphs for Tensorflow Serving servers running as the single model in the graph. For more complex graphs you can chain models:

* Sending the response from the first as a request to the second. This will be done automatically when you defined a chain of models as a Seldon graph. It is up to the user to ensure the response of each changed model can be fed a request to the next in the chain.
* Only Predict calls can be handled in multiple model chaining.


General considerations:

* Seldon components marked as MODELS, INPUT_TRANSFORMER and OUTPUT_TRANSFORMERS will allow a PredictionService Predict method to be called.
* GetModelStatus for any model in the graph is available.
* GetModelMetadata for any model in the graph is available.
* Combining and Routing with the Tensorflow protocol is not presently supported.
* `status` and `metadata` calls can be asked for any model in the graph
* a non-standard Seldon extension is available to call predict on the graph as a whole: `/v1/models/:predict`.
* The name of the model in the `graph` section of the SeldonDeployment spec must match the name of the model loaded onto the Tensorflow Server. {% endtab %}



5 changes: 5 additions & 0 deletions doc/source/production/optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ Using the Seldon python wrapper there are various optimization areas one needs t

### Seldon Protocol Payload Types with REST and gRPC

**Note**: Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).

We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.


Depending on whether you want to use REST or gRPC and want to send tensor data the format of the request will have a deserialization/serialization cost in the python wrapper. This is investigated in a [python serialization notebook](../examples/python_serialization.html).

The conclusions are:
Expand Down
5 changes: 4 additions & 1 deletion doc/source/reference/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,13 @@ Only the v1 versions of the CRD will be supported moving forward. The v1beta1 ve
### Model Health Checks
**Note**: Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).
We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.
We have updated the health checks done by Seldon for the model nodes in your inference graph. If `executor.fullHealthChecks` is set to true then:
* For Seldon protocol each node will be probed with `/api/v1.0/health/status`.
* For the Open Inference Protocol (or V2 protocol) each node will be probed with `/v2/health/ready`.
* For tensorflow just TCP checks will be run on the http endpoint.
* For the Open Inference Protocol (or V2 protocol) each node will be probed with `/v2/health/ready`.

By default we have set `executor.fullHealthChecks` to false for 1.14 release as users would need to rebuild their custom python models if they have not implemented the `health_status` method. In future we will default to `true`.

Expand Down
6 changes: 5 additions & 1 deletion examples/models/lightgbm_custom_server/iris.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,13 @@
"source": [
"# Custom LightGBM Prepackaged Model Server\n",
"\n",
"**Note**: Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).\n",
"\n",
"We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.\n",
"\n",
"In this notebook we create a new custom LIGHTGBM_SERVER prepackaged server with two versions:\n",
" * A Seldon protocol LightGBM model server\n",
" * A KfServing V2 protocol version using MLServer for running lightgbm models\n",
" * A KfServing Open Inference protocol or V2 protocol version using MLServer for running lightgbm models\n",
"\n",
"The Seldon model server is in defined in `lightgbmserver` folder.\n",
"\n",
Expand Down
5 changes: 4 additions & 1 deletion notebooks/backwards_compatability.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@
" * curl\n",
" * grpcurl\n",
" * pygmentize\n",
" \n",
"\n",
"**Note**: Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).\n",
"\n",
"We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.\n",
"\n",
"## Setup Seldon Core\n",
"\n",
Expand Down
7 changes: 5 additions & 2 deletions notebooks/protocol_examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,13 @@
" \n",
"## Examples\n",
"\n",
" * [Open Inference Protocol or V2 Protocol](#V2-Protocol-Model)\n",
" * [Seldon Protocol](#Seldon-Protocol-Model)\n",
" * [Tensorflow Protocol](#Tensorflow-Protocol-Model)\n",
" * [V2 Protocol](#V2-Protocol-Model)\n",
" \n",
"\n",
"**Note**:Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).\n",
"\n",
"We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.\n",
"\n",
"## Setup Seldon Core\n",
"\n",
Expand Down
10 changes: 7 additions & 3 deletions notebooks/server_examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,15 @@
"source": [
"## Serve SKLearn Iris Model\n",
"\n",
"**Note**: Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).\n",
"\n",
"We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.\n",
"\n",
"In order to deploy SKLearn artifacts, we can leverage the [pre-packaged SKLearn inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/sklearn.html).\n",
"The exposed API can follow either:\n",
"- Open Inference Protocol or V2 Protocol.\n",
"- Seldon protocol. \n",
"\n",
"- The default Seldon protocol. \n",
"- The V2 protocol.\n",
"\n"
]
},
Expand Down Expand Up @@ -350,7 +354,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### V2 protocol\n",
"### Open Inference Protocol or V2 Protocol\n",
"\n",
"For example, we can consider the config below:"
]
Expand Down

0 comments on commit 801ddf4

Please sign in to comment.