The example presents a simple Linear model implemented in PyTorch
Example consists of following scripts:
server.py
- start the model with Triton Inference Serverclient.py
- execute HTTP/gRPC requests to the deployed model
The example requires the torch
package. It can be installed in your current environment using pip:
pip install torch
Or you can use NVIDIA PyTorch container:
docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:24.10-py3 bash
If you select to use container we recommend to install NVIDIA Container Toolkit.
The step-by-step guide:
- Install PyTriton following the installation instruction
- In current terminal start the model on Triton using
server.py
./server.py
- Open new terminal tab (ex.
Ctrl + T
on Ubuntu) or window - Go to the example directory
- Run the
client.py
to perform queries on model:
./client.py