face-hugger

This repository is meant to be a minimal example of converting a HuggingFace model to ONNX then hosting it on Triton deployed to Kubernetes.

ONNX Export

This repository uses Huggingface Optimum to convert a transformer model to the ONNX format. In order to not use the same pod resources for both serving and exporting I have used a Helm Chart hook to save the graph to a persistent volume which is then used to load the model for inference in the serving pod.

TensorRT Conversion (In Progress)

Sine TensorRT gives better performance than even level 99 ONNX optimized graph on GPU we will try to convert the ONNX graph to TensorRT and host that.

Triton

This graph is then hosted for inference on a Triton server deployed in Kubernetes. The server is exposed through a Kubernetes LoadBalancer where outside requests can communicate with the model in Triton.

I wanted to use Triton to see if it was a better MLOps solution for inference as well as learn more about TensorRT.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
k8s		k8s
resources		resources
terraform		terraform
triton-config		triton-config
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
export_model_to_repository.py		export_model_to_repository.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

face-hugger

ONNX Export

TensorRT Conversion (In Progress)

Triton

About

Releases

Packages

Languages

sam-h-bean/face-hugger

Folders and files

Latest commit

History

Repository files navigation

face-hugger

ONNX Export

TensorRT Conversion (In Progress)

Triton

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages