We provide simple examples on how to integrate PyTorch, TensorFlow2, JAX, and simple Python models with the Triton Inference Server using PyTriton. The examples are available in the GitHub repository.
The list of example models deployments:
- Add-Sub Python model
- Add-Sub Python model Jupyter Notebook
- BART PyTorch from HuggingFace
- BERT JAX from HuggingFace
- Identity Python model
- Linear RAPIDS/CuPy model
- Linear RAPIDS/CuPy model Jupyter Notebook
- Linear PyTorch model
- Multi-Layer TensorFlow2
- Multi Instance deployment for ResNet50 PyTorch model
- Multi Model deployment for Python models
- NeMo Megatron GPT model with multi-node support
- OPT JAX from HuggingFace with multi-node support
- ResNet50 PyTorch from HuggingFace
- Stable Diffusion 1.5 from HuggingFace
- Using custom HTTP/gRPC headers and parameters
The Perf Analyzer can be used to profile the models served through PyTriton. We have prepared an example of using Perf Analyzer to profile BART PyTorch. See the example code in the GitHub repository.
The following examples contain a guide on how to deploy them on a Kubernetes cluster: